W0731 02:11:05.467000 139951240857408 torch/distributed/run.py:757] W0731 02:11:05.467000 139951240857408 torch/distributed/run.py:757] ***************************************** W0731 02:11:05.467000 139951240857408 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0731 02:11:05.467000 139951240857408 torch/distributed/run.py:757] ***************************************** [2024-07-31 02:11:06,949] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-31 02:11:06,949] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-31 02:11:06,952] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-31 02:11:06,952] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-31 02:11:06,954] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-31 02:11:06,955] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-31 02:11:06,964] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-31 02:11:06,964] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) petrel_client is not installed. If you read data locally instead of from ceph, ignore it. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. Replace train sampler!! Replace train sampler!!petrel_client is not installed. Using PIL to load images. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. Using PIL to load images. Replace train sampler!! Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. [2024-07-31 02:11:10,884] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-31 02:11:10,884] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 07/31/2024 02:11:10 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 07/31/2024 02:11:10 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=/data/jcy/project/InternVL/internvl_chat/zero_stage3_config2.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=IntervalStrategy.NO, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=8, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=True, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/runs/Jul31_02-11-10_e028538ab8e8, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=OptimizerNames.ADAMW_TORCH, optim_args=None, output_dir=/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=2, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['wandb'], resume_from_checkpoint=None, run_name=/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=200, save_strategy=IntervalStrategy.STEPS, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.03, warmup_steps=0, weight_decay=0.05, ) 07/31/2024 02:11:10 - INFO - __main__ - Loading Tokenizer: /data/jcy/ckpt/internvl-chat-v1-5 [INFO|tokenization_utils_base.py:2025] 2024-07-31 02:11:10,912 >> loading file ./tokenizer.model [INFO|tokenization_utils_base.py:2025] 2024-07-31 02:11:10,912 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2025] 2024-07-31 02:11:10,912 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2025] 2024-07-31 02:11:10,912 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2025] 2024-07-31 02:11:10,912 >> loading file tokenizer.json [WARNING|logging.py:314] 2024-07-31 02:11:11,060 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 07/31/2024 02:11:11 - INFO - __main__ - Loading InternVLChatModel... [INFO|configuration_utils.py:727] 2024-07-31 02:11:11,173 >> loading configuration file /data/jcy/ckpt/internvl-chat-v1-5/config.json [INFO|configuration_utils.py:792] 2024-07-31 02:11:11,174 >> Model config InternVLChatConfig { "_commit_hash": null, "architectures": [ "InternVLChatModel" ], "auto_map": { "AutoConfig": "configuration_internvl_chat.InternVLChatConfig", "AutoModel": "modeling_internvl_chat.InternVLChatModel", "AutoModelForCausalLM": "modeling_internvl_chat.InternVLChatModel" }, "downsample_ratio": 0.5, "dynamic_image_size": true, "force_image_size": 448, "llm_config": { "_name_or_path": "internlm/internlm2-chat-20b", "add_cross_attention": false, "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "flash_attention_2", "auto_map": { "AutoConfig": "configuration_internlm2.InternLM2Config", "AutoModel": "modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "modeling_internlm2.InternLM2ForCausalLM" }, "bad_words_ids": null, "begin_suppress_tokens": null, "bias": false, "bos_token_id": 1, "chunk_size_feed_forward": 0, "cross_attention_hidden_size": null, "decoder_start_token_id": null, "diversity_penalty": 0.0, "do_sample": false, "early_stopping": false, "encoder_no_repeat_ngram_size": 0, "eos_token_id": 2, "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "hidden_act": "silu", "hidden_size": 6144, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "intermediate_size": 16384, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "length_penalty": 1.0, "max_length": 20, "max_position_embeddings": 32768, "min_length": 0, "model_type": "internlm2", "no_repeat_ngram_size": 0, "num_attention_heads": 48, "num_beam_groups": 1, "num_beams": 1, "num_hidden_layers": 48, "num_key_value_heads": 8, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": 2, "prefix": null, "problem_type": null, "pruned_heads": {}, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 3.0, "type": "dynamic" }, "rope_theta": 1000000, "sep_token_id": null, "suppress_tokens": null, "task_specific_params": null, "temperature": 1.0, "tf_legacy_loss": false, "tie_encoder_decoder": false, "tie_word_embeddings": false, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": "bfloat16", "torchscript": false, "transformers_version": "4.37.2", "typical_p": 1.0, "use_bfloat16": true, "use_cache": true, "vocab_size": 92553 }, "max_dynamic_patch": 12, "min_dynamic_patch": 1, "model_type": "internvl_chat", "pad2square": false, "ps_version": "v2", "select_layer": -1, "system_message": "You are an AI assistant whose name is InternLM (\u4e66\u751f\u00b7\u6d66\u8bed).", "template": "internlm2-chat", "torch_dtype": "bfloat16", "transformers_version": null, "use_backbone_lora": 0, "use_llm_lora": 0, "use_thumbnail": true, "vision_config": { "_name_or_path": "", "add_cross_attention": false, "architectures": [ "InternVisionModel" ], "attention_dropout": 0.0, "bad_words_ids": null, "begin_suppress_tokens": null, "bos_token_id": null, "chunk_size_feed_forward": 0, "cross_attention_hidden_size": null, "decoder_start_token_id": null, "diversity_penalty": 0.0, "do_sample": false, "drop_path_rate": 0.0, "dropout": 0.0, "early_stopping": false, "encoder_no_repeat_ngram_size": 0, "eos_token_id": null, "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "hidden_act": "gelu", "hidden_size": 3200, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "image_size": 448, "initializer_factor": 0.1, "initializer_range": 1e-10, "intermediate_size": 12800, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_eps": 1e-06, "length_penalty": 1.0, "max_length": 20, "min_length": 0, "model_type": "intern_vit_6b", "no_repeat_ngram_size": 0, "norm_type": "rms_norm", "num_attention_heads": 25, "num_beam_groups": 1, "num_beams": 1, "num_channels": 3, "num_hidden_layers": 45, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": null, "patch_size": 14, "prefix": null, "problem_type": null, "pruned_heads": {}, "qk_normalization": true, "qkv_bias": false, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "sep_token_id": null, "suppress_tokens": null, "task_specific_params": null, "temperature": 1.0, "tf_legacy_loss": false, "tie_encoder_decoder": false, "tie_word_embeddings": true, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": "bfloat16", "torchscript": false, "transformers_version": "4.37.2", "typical_p": 1.0, "use_bfloat16": true, "use_flash_attn": true } } 07/31/2024 02:11:11 - INFO - __main__ - Using flash_attention_2 for InternLM [INFO|modeling_utils.py:3473] 2024-07-31 02:11:11,176 >> loading weights file /data/jcy/ckpt/internvl-chat-v1-5/model.safetensors.index.json [INFO|modeling_utils.py:1426] 2024-07-31 02:11:11,177 >> Instantiating InternVLChatModel model under default dtype torch.bfloat16. [INFO|modeling_utils.py:3582] 2024-07-31 02:11:11,177 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model [INFO|configuration_utils.py:826] 2024-07-31 02:11:11,187 >> Generate config GenerationConfig {} [2024-07-31 02:11:11,300] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-31 02:11:11,303] [INFO] [comm.py:637:init_distributed] cdb=None 07/31/2024 02:11:11 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False [2024-07-31 02:11:11,317] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-31 02:11:11,319] [INFO] [comm.py:637:init_distributed] cdb=None 07/31/2024 02:11:11 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False [2024-07-31 02:11:11,325] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-31 02:11:11,326] [INFO] [comm.py:637:init_distributed] cdb=None 07/31/2024 02:11:11 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: False [2024-07-31 02:11:11,332] [INFO] [comm.py:637:init_distributed] cdb=None 07/31/2024 02:11:11 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False 07/31/2024 02:11:11 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False 07/31/2024 02:11:11 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: False 07/31/2024 02:11:11 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False [WARNING|logging.py:314] 2024-07-31 02:11:11,470 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-31 02:11:11,473 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-31 02:11:11,484 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-31 02:11:11,493 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-31 02:11:11,498 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-31 02:11:11,509 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-31 02:11:11,513 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:826] 2024-07-31 02:11:13,387 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 2 } [2024-07-31 02:11:13,612] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 934, num_elems = 25.51B Loading checkpoint shards: 0%| | 0/11 [00:00> All model checkpoint weights were used when initializing InternVLChatModel. [INFO|modeling_utils.py:4358] 2024-07-31 02:11:27,742 >> All the weights of InternVLChatModel were initialized from the model checkpoint at /data/jcy/ckpt/internvl-chat-v1-5. If your task is similar to the task the model of the checkpoint was trained on, you can already use InternVLChatModel for predictions without further training. [INFO|configuration_utils.py:779] 2024-07-31 02:11:27,750 >> loading configuration file /data/jcy/ckpt/internvl-chat-v1-5/generation_config.json [INFO|configuration_utils.py:826] 2024-07-31 02:11:27,750 >> Generate config GenerationConfig {} 07/31/2024 02:11:27 - INFO - __main__ - Finished 07/31/2024 02:11:27 - INFO - __main__ - model.config.force_image_size: 448 07/31/2024 02:11:27 - INFO - __main__ - data_args.force_image_size: 448 07/31/2024 02:11:27 - INFO - __main__ - model.config.vision_config.image_size: 448 07/31/2024 02:11:27 - INFO - __main__ - [Dataset] num_image_token: 256 07/31/2024 02:11:27 - INFO - __main__ - [Dataset] dynamic_image_size: True 07/31/2024 02:11:27 - INFO - __main__ - [Dataset] use_thumbnail: True 07/31/2024 02:11:27 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 07/31/2024 02:11:27 - INFO - __main__ - Formatting inputs...Skip in lazy mode [WARNING|tokenization_utils_base.py:3841] 2024-07-31 02:11:34,274 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-31 02:11:34,301 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-31 02:11:34,328 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-31 02:11:34,358 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-31 02:11:34,421 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-31 02:11:34,462 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-31 02:11:35,060 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-31 02:11:35,178 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/jcy/.cache/torch_extensions/py312_cu121/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... ninja: no work to do. Loading extension module fused_adam... Time to load fused_adam op: 0.15230607986450195 seconds Loading extension module fused_adam... Time to load fused_adam op: 0.10260581970214844 seconds Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/jcy/.cache/torch_extensions/py312_cu121/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... ninja: no work to do. Loading extension module fused_adam... 07/31/2024 02:11:37 - INFO - __main__ - Add dataset: caption with length: 157445 Time to load fused_adam op: 0.1389765739440918 seconds 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.tok_embeddings.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.0.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.0.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.0.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.0.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.0.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.0.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.0.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.1.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.1.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.1.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.1.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.1.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.1.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.1.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.2.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.2.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.2.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.2.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.2.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.2.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.2.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.3.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.3.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.3.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.3.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.3.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.3.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.3.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.4.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.4.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.4.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.4.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.4.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.4.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.4.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.5.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.5.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.5.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.5.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.5.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.5.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.5.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.6.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.6.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.6.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.6.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.6.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.6.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.6.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.7.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.7.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.7.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.7.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.7.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.7.attention_norm.weight Loading extension module fused_adam... 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.7.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.8.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.8.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.8.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.8.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.8.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.8.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.8.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.9.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.9.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.9.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.9.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.9.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.9.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.9.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.10.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.10.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.10.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.10.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.10.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.10.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.10.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.11.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.11.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.11.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.11.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.11.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.11.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.11.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.12.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.12.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.12.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.12.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.12.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.12.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.12.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.13.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.13.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.13.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.13.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.13.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.13.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.13.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.14.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.14.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.14.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.14.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.14.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.14.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.14.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.15.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.15.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.15.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.15.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.15.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.15.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.15.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.16.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.16.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.16.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.16.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.16.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.16.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.16.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.17.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.17.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.17.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.17.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.17.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.17.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.17.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.18.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.18.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.18.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.18.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.18.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.18.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.18.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.19.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.19.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.19.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.19.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.19.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.19.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.19.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.20.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.20.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.20.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.20.feed_forward.w3.weight Time to load fused_adam op: 0.10245585441589355 seconds 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.20.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.20.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.20.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.21.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.21.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.21.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.21.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.21.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.21.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.21.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.22.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.22.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.22.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.22.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.22.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.22.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.22.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.23.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.23.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.23.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.23.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.23.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.23.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.23.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.24.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.24.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.24.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.24.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.24.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.24.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.24.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.25.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.25.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.25.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.25.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.25.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.25.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.25.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.26.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.26.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.26.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.26.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.26.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.26.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.26.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.27.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.27.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.27.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.27.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.27.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.27.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.27.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.28.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.28.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.28.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.28.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.28.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.28.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.28.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.29.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.29.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.29.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.29.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.29.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.29.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.29.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.30.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.30.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.30.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.30.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.30.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.30.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.30.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.31.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.31.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.31.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.31.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.31.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.31.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.31.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.32.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.32.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.32.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.32.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.32.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.32.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.32.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.33.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.33.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.33.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.33.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.33.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.33.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.33.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.34.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.34.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.34.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.34.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.34.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.34.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.34.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.35.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.35.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.35.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.35.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.35.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.35.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.35.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.36.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.36.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.36.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.36.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.36.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.36.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.36.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.37.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.37.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.37.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.37.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.37.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.37.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.37.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.38.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.38.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.38.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.38.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.38.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.38.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.38.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.39.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.39.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.39.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.39.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.39.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.39.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.39.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.40.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.40.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.40.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.40.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.40.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.40.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.40.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.41.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.41.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.41.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.41.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.41.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.41.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.41.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.42.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.42.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.42.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.42.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.42.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.42.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.42.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.43.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.43.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.43.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.43.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.43.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.43.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.43.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.44.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.44.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.44.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.44.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.44.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.44.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.44.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.45.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.45.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.45.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.45.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.45.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.45.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.45.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.46.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.46.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.46.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.46.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.46.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.46.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.46.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.47.attention.wqkv.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.47.attention.wo.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.47.feed_forward.w1.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.47.feed_forward.w3.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.47.feed_forward.w2.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.47.attention_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.layers.47.ffn_norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.model.norm.weight 07/31/2024 02:11:37 - INFO - __main__ - language_model.output.weight 07/31/2024 02:11:37 - INFO - __main__ - mlp1.0.weight 07/31/2024 02:11:37 - INFO - __main__ - mlp1.0.bias 07/31/2024 02:11:37 - INFO - __main__ - mlp1.1.weight 07/31/2024 02:11:37 - INFO - __main__ - mlp1.1.bias 07/31/2024 02:11:37 - INFO - __main__ - mlp1.3.weight 07/31/2024 02:11:37 - INFO - __main__ - mlp1.3.bias [INFO|trainer.py:571] 2024-07-31 02:11:37,778 >> Using auto half precision backend Loading extension module fused_adam... Time to load fused_adam op: 0.10237693786621094 seconds Loading extension module fused_adam... Time to load fused_adam op: 0.3027455806732178 seconds [2024-07-31 02:11:37,958] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.6, git-hash=unknown, git-branch=unknown [2024-07-31 02:11:37,981] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/jcy/.cache/torch_extensions/py312_cu121/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_adam... Time to load fused_adam op: 0.1339399814605713 seconds [2024-07-31 02:11:38,532] [INFO] [logging.py:96:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2024-07-31 02:11:38,532] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer [2024-07-31 02:11:38,587] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam [2024-07-31 02:11:38,587] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2024-07-31 02:11:38,588] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False [2024-07-31 02:11:38,588] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/jcy/.cache/torch_extensions/py312_cu121/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. [2024-07-31 02:11:38,767] [INFO] [utils.py:791:see_memory_usage] Stage 3 initialize beginning [2024-07-31 02:11:38,767] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 8.81 GB CA 6.97 GB Max_CA 9 GB [2024-07-31 02:11:38,768] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 29.11 GB, percent = 2.9% [2024-07-31 02:11:38,772] [INFO] [stage3.py:127:__init__] Reduce bucket size 1000000000 [2024-07-31 02:11:38,772] [INFO] [stage3.py:128:__init__] Prefetch bucket size 1000000000 Loading extension module fused_adam... Time to load fused_adam op: 0.14381694793701172 seconds [2024-07-31 02:11:38,951] [INFO] [utils.py:791:see_memory_usage] DeepSpeedZeRoOffload initialize [begin] [2024-07-31 02:11:38,952] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 6.69 GB CA 6.97 GB Max_CA 7 GB [2024-07-31 02:11:38,952] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 29.11 GB, percent = 2.9% Parameter Offload: Total persistent parameters: 7529856 in 510 params [2024-07-31 02:11:39,189] [INFO] [utils.py:791:see_memory_usage] DeepSpeedZeRoOffload initialize [end] [2024-07-31 02:11:39,190] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 6.69 GB CA 6.97 GB Max_CA 7 GB [2024-07-31 02:11:39,190] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 29.11 GB, percent = 2.9% [2024-07-31 02:11:39,389] [INFO] [utils.py:791:see_memory_usage] Before creating fp16 partitions [2024-07-31 02:11:39,390] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 6.69 GB CA 6.97 GB Max_CA 7 GB [2024-07-31 02:11:39,390] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 29.11 GB, percent = 2.9% [2024-07-31 02:11:43,462] [INFO] [utils.py:791:see_memory_usage] After creating fp16 partitions: 3 [2024-07-31 02:11:43,463] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 6.69 GB CA 10.76 GB Max_CA 11 GB [2024-07-31 02:11:43,463] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 49.03 GB, percent = 4.9% [2024-07-31 02:11:43,653] [INFO] [utils.py:791:see_memory_usage] Before creating fp32 partitions [2024-07-31 02:11:43,653] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 6.69 GB CA 10.76 GB Max_CA 11 GB [2024-07-31 02:11:43,654] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 40.13 GB, percent = 4.0% [2024-07-31 02:11:43,847] [INFO] [utils.py:791:see_memory_usage] After creating fp32 partitions [2024-07-31 02:11:43,847] [INFO] [utils.py:792:see_memory_usage] MA 16.0 GB Max_MA 16.91 GB CA 21.99 GB Max_CA 22 GB [2024-07-31 02:11:43,848] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 30.64 GB, percent = 3.0% [2024-07-31 02:11:44,127] [INFO] [utils.py:791:see_memory_usage] Before initializing optimizer states [2024-07-31 02:11:44,128] [INFO] [utils.py:792:see_memory_usage] MA 16.0 GB Max_MA 16.0 GB CA 21.99 GB Max_CA 22 GB [2024-07-31 02:11:44,128] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 29.13 GB, percent = 2.9% [2024-07-31 02:11:44,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | init_optimizer_state: 58.03 [2024-07-31 02:11:44,407] [INFO] [utils.py:791:see_memory_usage] After initializing optimizer states [2024-07-31 02:11:44,408] [INFO] [utils.py:792:see_memory_usage] MA 34.6 GB Max_MA 38.35 GB CA 42.53 GB Max_CA 43 GB [2024-07-31 02:11:44,408] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 29.14 GB, percent = 2.9% [2024-07-31 02:11:44,408] [INFO] [stage3.py:479:_setup_for_real_optimizer] optimizer state initialized [2024-07-31 02:11:44,829] [INFO] [utils.py:791:see_memory_usage] After initializing ZeRO optimizer [2024-07-31 02:11:44,830] [INFO] [utils.py:792:see_memory_usage] MA 41.12 GB Max_MA 43.24 GB CA 48.24 GB Max_CA 48 GB [2024-07-31 02:11:44,830] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 29.14 GB, percent = 2.9% [2024-07-31 02:11:44,830] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2024-07-31 02:11:44,830] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupCosineLR [2024-07-31 02:11:44,830] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2024-07-31 02:11:44,830] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05], mom=[[0.9, 0.999]] [2024-07-31 02:11:44,833] [INFO] [config.py:984:print] DeepSpeedEngine configuration: [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] amp_enabled .................. False [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] amp_params ................... False [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] bfloat16_enabled ............. True [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] checkpoint_parallel_write_pipeline False [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] checkpoint_tag_validation_enabled True [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] checkpoint_tag_validation_fail False [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] comms_config ................. [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] communication_data_type ...... None [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] curriculum_enabled_legacy .... False [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] curriculum_params_legacy ..... False [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-07-31 02:11:44,833] [INFO] [config.py:988:print] data_efficiency_enabled ...... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] dataloader_drop_last ......... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] disable_allgather ............ False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] dump_state ................... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] dynamic_loss_scale_args ...... None [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] eigenvalue_enabled ........... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] eigenvalue_gas_boundary_resolution 1 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] eigenvalue_layer_num ......... 0 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] eigenvalue_max_iter .......... 100 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] eigenvalue_stability ......... 1e-06 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] eigenvalue_tol ............... 0.01 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] eigenvalue_verbose ........... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] elasticity_enabled ........... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] fp16_auto_cast ............... None [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] fp16_enabled ................. False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] fp16_master_weights_and_gradients False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] global_rank .................. 0 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] grad_accum_dtype ............. None [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] gradient_accumulation_steps .. 8 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] gradient_clipping ............ 1.0 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] gradient_predivide_factor .... 1.0 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] graph_harvesting ............. False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] initial_dynamic_scale ........ 1 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] load_universal_checkpoint .... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] loss_scale ................... 1.0 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] memory_breakdown ............. False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] mics_hierarchial_params_gather False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] mics_shard_size .............. -1 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] optimizer_legacy_fusion ...... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] optimizer_name ............... adamw [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] optimizer_params ............. {'lr': 2e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.05} [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] pld_enabled .................. False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] pld_params ................... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] prescale_gradients ........... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] scheduler_name ............... WarmupCosineLR [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] scheduler_params ............. {'warmup_min_ratio': 0, 'cos_min_ratio': 0, 'warmup_num_steps': 37, 'warmup_type': 'linear', 'total_num_steps': 1230} [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] seq_parallel_communication_data_type torch.float32 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] sparse_attention ............. None [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] sparse_gradients_enabled ..... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] steps_per_print .............. inf [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] train_batch_size ............. 128 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] train_micro_batch_size_per_gpu 2 [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] use_data_before_expert_parallel_ False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] use_node_local_storage ....... False [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] wall_clock_breakdown ......... True [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] weight_quantization_config ... None [2024-07-31 02:11:44,834] [INFO] [config.py:988:print] world_size ................... 8 [2024-07-31 02:11:44,835] [INFO] [config.py:988:print] zero_allow_untested_optimizer False [2024-07-31 02:11:44,835] [INFO] [config.py:988:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=1000000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=1000000000 param_persistence_threshold=10000000 model_persistence_threshold=sys.maxsize max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=True stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2024-07-31 02:11:44,835] [INFO] [config.py:988:print] zero_enabled ................. True [2024-07-31 02:11:44,835] [INFO] [config.py:988:print] zero_force_ds_cpu_optimizer .. True [2024-07-31 02:11:44,835] [INFO] [config.py:988:print] zero_optimization_stage ...... 3 [2024-07-31 02:11:44,835] [INFO] [config.py:974:print_user_config] json = { "zero_optimization": { "stage": 3, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1.000000e+09, "reduce_bucket_size": 1.000000e+09, "stage3_prefetch_bucket_size": 1.000000e+09, "stage3_param_persistence_threshold": 1.000000e+07, "stage3_max_live_parameters": 1.000000e+09, "stage3_max_reuse_distance": 1.000000e+09, "stage3_gather_16bit_weights_on_model_save": true }, "fp16": { "enabled": false, "auto_cast": true, "loss_scale": 0, "initial_scale_power": 32, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": true }, "optimizer": { "type": "AdamW", "params": { "lr": 2e-05, "betas": [0.9, 0.999], "eps": 1e-08, "weight_decay": 0.05 } }, "scheduler": { "type": "WarmupCosineLR", "params": { "warmup_min_ratio": 0, "cos_min_ratio": 0, "warmup_num_steps": 37, "warmup_type": "linear", "total_num_steps": 1.230000e+03 } }, "gradient_accumulation_steps": 8, "gradient_clipping": 1.0, "steps_per_print": inf, "train_batch_size": 128, "train_micro_batch_size_per_gpu": 2, "wall_clock_breakdown": true } [INFO|trainer.py:1721] 2024-07-31 02:11:44,835 >> ***** Running training ***** [INFO|trainer.py:1722] 2024-07-31 02:11:44,835 >> Num examples = 157,445 [INFO|trainer.py:1723] 2024-07-31 02:11:44,835 >> Num Epochs = 1 [INFO|trainer.py:1724] 2024-07-31 02:11:44,835 >> Instantaneous batch size per device = 2 [INFO|trainer.py:1727] 2024-07-31 02:11:44,835 >> Total train batch size (w. parallel, distributed & accumulation) = 128 [INFO|trainer.py:1728] 2024-07-31 02:11:44,835 >> Gradient Accumulation steps = 8 [INFO|trainer.py:1729] 2024-07-31 02:11:44,835 >> Total optimization steps = 1,230 [INFO|trainer.py:1730] 2024-07-31 02:11:44,839 >> Number of trainable parameters = 19,977,690,112 [INFO|integration_utils.py:722] 2024-07-31 02:11:44,842 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" wandb: Currently logged in as: darrendong (pku_kcl). Use `wandb login --relogin` to force relogin wandb: - Waiting for wandb.init()... wandb: \ Waiting for wandb.init()... wandb: wandb version 0.17.5 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.17.0 wandb: Run data is saved locally in /data/jcy/project/InternVL/internvl_chat/wandb/run-20240731_021148-lciklai1 wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run major-galaxy-29 wandb: ⭐️ View project at https://wandb.ai/pku_kcl/huggingface wandb: 🚀 View run at https://wandb.ai/pku_kcl/huggingface/runs/lciklai1 0%| | 0/1230 [00:00> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200 [INFO|configuration_utils.py:473] 2024-07-31 06:06:29,350 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/config.json [INFO|configuration_utils.py:594] 2024-07-31 06:06:29,351 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/generation_config.json [INFO|modeling_utils.py:2501] 2024-07-31 06:07:21,029 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-07-31 06:07:21,031 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-07-31 06:07:21,031 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-07-31 06:07:21,031 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/added_tokens.json [2024-07-31 06:07:21,072] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step200 is about to be saved! [2024-07-31 06:07:23,930] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/global_step200/zero_pp_rank_0_mp_rank_00_model_states.pt [2024-07-31 06:07:23,930] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/global_step200/zero_pp_rank_0_mp_rank_00_model_states.pt... [2024-07-31 06:07:25,164] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/global_step200/zero_pp_rank_0_mp_rank_00_model_states.pt. [2024-07-31 06:07:25,169] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-07-31 06:08:28,345] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-07-31 06:08:28,346] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-07-31 06:08:28,371] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step200 is ready now! dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3608 [2024-07-31 06:08:37,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.59 | bwd_microstep: 5617.55 | bwd_inner_microstep: 5446.52 | bwd_allreduce_microstep: 170.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3874 [2024-07-31 06:08:46,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.93 | bwd_microstep: 5102.78 | bwd_inner_microstep: 5083.43 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3928 [2024-07-31 06:08:55,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.28 | bwd_microstep: 5140.69 | bwd_inner_microstep: 5121.20 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 06:09:03,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3189.02 | bwd_microstep: 4771.39 | bwd_inner_microstep: 4732.13 | bwd_allreduce_microstep: 39.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 06:09:12,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.84 | bwd_microstep: 4981.96 | bwd_inner_microstep: 4962.52 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.19 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2102 [2024-07-31 06:09:20,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.42 | bwd_microstep: 5055.34 | bwd_inner_microstep: 4663.63 | bwd_allreduce_microstep: 391.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 06:09:29,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.89 | bwd_microstep: 4877.49 | bwd_inner_microstep: 4858.08 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 06:09:37,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 06:09:37,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.30 | bwd_microstep: 4948.07 | bwd_inner_microstep: 4899.88 | bwd_allreduce_microstep: 48.13 | step_microstep: 181.11 [2024-07-31 06:09:37,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28727.16 | bwd: 40495.22 | bwd_inner: 39767.32 | bwd_allreduce: 727.36 | step: 181.80 16%|█▋ | 201/1230 [3:57:43<32:22:23, 113.26s/it] {'loss': 1.2595, 'learning_rate': 1.9081845123881002e-05, 'epoch': 0.16} 16%|█▋ | 201/1230 [3:57:43<32:22:23, 113.26s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4018 [2024-07-31 06:09:47,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.47 | bwd_microstep: 5539.68 | bwd_inner_microstep: 5481.76 | bwd_allreduce_microstep: 57.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2323 [2024-07-31 06:09:56,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.17 | bwd_microstep: 5202.68 | bwd_inner_microstep: 4798.37 | bwd_allreduce_microstep: 404.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3784 [2024-07-31 06:10:04,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.79 | bwd_microstep: 5224.30 | bwd_inner_microstep: 5162.62 | bwd_allreduce_microstep: 61.61 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3750 [2024-07-31 06:10:13,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3865.44 | bwd_microstep: 5075.33 | bwd_inner_microstep: 5016.74 | bwd_allreduce_microstep: 58.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3659 [2024-07-31 06:10:22,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.14 | bwd_microstep: 4968.27 | bwd_inner_microstep: 4933.06 | bwd_allreduce_microstep: 35.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 06:10:31,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.31 | bwd_microstep: 5007.41 | bwd_inner_microstep: 4974.35 | bwd_allreduce_microstep: 32.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-31 06:10:39,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.98 | bwd_microstep: 5050.76 | bwd_inner_microstep: 4658.52 | bwd_allreduce_microstep: 392.17 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2754 [2024-07-31 06:10:48,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 06:10:48,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.84 | bwd_microstep: 5025.55 | bwd_inner_microstep: 4633.59 | bwd_allreduce_microstep: 391.90 | step_microstep: 181.55 [2024-07-31 06:10:48,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29409.06 | bwd: 41093.96 | bwd_inner: 39658.93 | bwd_allreduce: 1434.55 | step: 182.13 16%|█▋ | 202/1230 [3:58:54<28:42:26, 100.53s/it] {'loss': 1.195, 'learning_rate': 1.9070791211367984e-05, 'epoch': 0.16} 16%|█▋ | 202/1230 [3:58:54<28:42:26, 100.53s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3918 [2024-07-31 06:10:57,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.11 | bwd_microstep: 5249.12 | bwd_inner_microstep: 5209.82 | bwd_allreduce_microstep: 39.23 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3885 [2024-07-31 06:11:06,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.51 | bwd_microstep: 5470.90 | bwd_inner_microstep: 5404.15 | bwd_allreduce_microstep: 66.68 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2830 [2024-07-31 06:11:15,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.86 | bwd_microstep: 5170.49 | bwd_inner_microstep: 4766.84 | bwd_allreduce_microstep: 403.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-31 06:11:24,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.28 | bwd_microstep: 5169.05 | bwd_inner_microstep: 5114.32 | bwd_allreduce_microstep: 54.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 06:11:33,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.28 | bwd_microstep: 5047.78 | bwd_inner_microstep: 5019.35 | bwd_allreduce_microstep: 28.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 06:11:42,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.23 | bwd_microstep: 5189.00 | bwd_inner_microstep: 4783.98 | bwd_allreduce_microstep: 404.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 06:11:50,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3424.24 | bwd_microstep: 5051.11 | bwd_inner_microstep: 4999.25 | bwd_allreduce_microstep: 51.80 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2170 [2024-07-31 06:11:59,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-31 06:11:59,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.28 | bwd_microstep: 5196.80 | bwd_inner_microstep: 4793.43 | bwd_allreduce_microstep: 403.30 | step_microstep: 181.38 [2024-07-31 06:11:59,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28748.68 | bwd: 41544.23 | bwd_inner: 40091.08 | bwd_allreduce: 1452.66 | step: 181.97 17%|█▋ | 203/1230 [4:00:05<26:07:31, 91.58s/it] {'loss': 1.2302, 'learning_rate': 1.9059674396952963e-05, 'epoch': 0.17} 17%|█▋ | 203/1230 [4:00:05<26:07:31, 91.58s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 06:12:08,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.64 | bwd_microstep: 5184.46 | bwd_inner_microstep: 5115.48 | bwd_allreduce_microstep: 68.91 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3839 [2024-07-31 06:12:17,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.28 | bwd_microstep: 5042.57 | bwd_inner_microstep: 5023.17 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 06:12:25,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.69 | bwd_microstep: 5107.14 | bwd_inner_microstep: 5036.42 | bwd_allreduce_microstep: 70.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-31 06:12:34,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.00 | bwd_microstep: 5160.11 | bwd_inner_microstep: 5105.61 | bwd_allreduce_microstep: 54.43 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-31 06:12:43,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.96 | bwd_microstep: 5046.34 | bwd_inner_microstep: 5018.85 | bwd_allreduce_microstep: 27.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 06:12:52,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.93 | bwd_microstep: 4908.46 | bwd_inner_microstep: 4886.94 | bwd_allreduce_microstep: 21.45 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2131 [2024-07-31 06:13:00,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.98 | bwd_microstep: 5074.75 | bwd_inner_microstep: 4679.94 | bwd_allreduce_microstep: 394.74 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-31 06:13:09,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 06:13:09,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.45 | bwd_microstep: 5065.83 | bwd_inner_microstep: 4671.81 | bwd_allreduce_microstep: 393.95 | step_microstep: 182.14 [2024-07-31 06:13:09,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29006.82 | bwd: 40589.64 | bwd_inner: 39538.15 | bwd_allreduce: 1051.00 | step: 182.75 17%|█▋ | 204/1230 [4:01:15<24:14:56, 85.08s/it] {'loss': 1.228, 'learning_rate': 1.90484947577261e-05, 'epoch': 0.17} 17%|█▋ | 204/1230 [4:01:15<24:14:56, 85.08s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2383 [2024-07-31 06:13:18,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.93 | bwd_microstep: 5618.75 | bwd_inner_microstep: 5188.76 | bwd_allreduce_microstep: 429.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3922 [2024-07-31 06:13:27,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.40 | bwd_microstep: 5236.06 | bwd_inner_microstep: 5190.83 | bwd_allreduce_microstep: 45.15 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3833 [2024-07-31 06:13:36,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.36 | bwd_microstep: 5242.54 | bwd_inner_microstep: 5173.74 | bwd_allreduce_microstep: 68.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-31 06:13:45,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.78 | bwd_microstep: 5102.74 | bwd_inner_microstep: 5057.87 | bwd_allreduce_microstep: 44.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-31 06:13:53,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3039.91 | bwd_microstep: 4984.11 | bwd_inner_microstep: 4597.85 | bwd_allreduce_microstep: 386.19 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 06:14:01,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3227.37 | bwd_microstep: 4816.53 | bwd_inner_microstep: 4795.21 | bwd_allreduce_microstep: 21.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 06:14:10,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.06 | bwd_microstep: 5187.20 | bwd_inner_microstep: 5107.85 | bwd_allreduce_microstep: 79.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 06:14:18,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 06:14:18,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.66 | bwd_microstep: 4894.38 | bwd_inner_microstep: 4875.07 | bwd_allreduce_microstep: 19.24 | step_microstep: 181.58 [2024-07-31 06:14:18,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28087.36 | bwd: 41082.30 | bwd_inner: 39987.13 | bwd_allreduce: 1094.67 | step: 182.26 17%|█▋ | 205/1230 [4:02:24<22:53:40, 80.41s/it] {'loss': 1.2262, 'learning_rate': 1.903725237121322e-05, 'epoch': 0.17} 17%|█▋ | 205/1230 [4:02:24<22:53:40, 80.41s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3551 [2024-07-31 06:14:28,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.30 | bwd_microstep: 5477.89 | bwd_inner_microstep: 5300.92 | bwd_allreduce_microstep: 176.90 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3574 [2024-07-31 06:14:36,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.15 | bwd_microstep: 4915.39 | bwd_inner_microstep: 4855.17 | bwd_allreduce_microstep: 60.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-31 06:14:45,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.47 | bwd_microstep: 5167.01 | bwd_inner_microstep: 5117.06 | bwd_allreduce_microstep: 49.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-31 06:14:53,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.90 | bwd_microstep: 4972.17 | bwd_inner_microstep: 4941.58 | bwd_allreduce_microstep: 30.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-31 06:15:01,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3034.84 | bwd_microstep: 4918.36 | bwd_inner_microstep: 4541.88 | bwd_allreduce_microstep: 376.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 06:15:10,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.45 | bwd_microstep: 4994.01 | bwd_inner_microstep: 4944.71 | bwd_allreduce_microstep: 49.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-31 06:15:18,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.16 | bwd_microstep: 4879.22 | bwd_inner_microstep: 4859.95 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3665 [2024-07-31 06:15:27,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 06:15:27,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.46 | bwd_microstep: 4989.52 | bwd_inner_microstep: 4953.06 | bwd_allreduce_microstep: 36.40 | step_microstep: 181.79 [2024-07-31 06:15:27,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28269.61 | bwd: 40313.55 | bwd_inner: 39514.24 | bwd_allreduce: 798.80 | step: 182.38 17%|█▋ | 206/1230 [4:03:33<21:53:30, 76.96s/it] {'loss': 1.2366, 'learning_rate': 1.902594731537527e-05, 'epoch': 0.17} 17%|█▋ | 206/1230 [4:03:33<21:53:30, 76.96s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3921 [2024-07-31 06:15:36,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3363.73 | bwd_microstep: 5202.89 | bwd_inner_microstep: 5149.10 | bwd_allreduce_microstep: 53.73 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3832 [2024-07-31 06:15:45,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.64 | bwd_microstep: 5101.78 | bwd_inner_microstep: 5077.29 | bwd_allreduce_microstep: 24.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-31 06:15:54,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.95 | bwd_microstep: 5167.55 | bwd_inner_microstep: 5112.78 | bwd_allreduce_microstep: 54.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 06:16:02,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.95 | bwd_microstep: 5081.20 | bwd_inner_microstep: 4689.22 | bwd_allreduce_microstep: 391.91 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3577 [2024-07-31 06:16:11,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.60 | bwd_microstep: 5098.98 | bwd_inner_microstep: 5022.96 | bwd_allreduce_microstep: 75.95 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2687 [2024-07-31 06:16:20,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.58 | bwd_microstep: 5096.82 | bwd_inner_microstep: 4698.71 | bwd_allreduce_microstep: 398.05 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3691 [2024-07-31 06:16:28,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.31 | bwd_microstep: 5119.24 | bwd_inner_microstep: 5030.70 | bwd_allreduce_microstep: 88.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 06:16:37,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 06:16:37,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.43 | bwd_microstep: 5126.83 | bwd_inner_microstep: 5056.77 | bwd_allreduce_microstep: 69.98 | step_microstep: 181.17 [2024-07-31 06:16:37,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28552.10 | bwd: 40995.27 | bwd_inner: 39837.47 | bwd_allreduce: 1157.33 | step: 181.76 17%|█▋ | 207/1230 [4:04:43<21:15:58, 74.84s/it] {'loss': 1.2131, 'learning_rate': 1.901457966860779e-05, 'epoch': 0.17} 17%|█▋ | 207/1230 [4:04:43<21:15:58, 74.84s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3893 [2024-07-31 06:16:46,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.07 | bwd_microstep: 5283.59 | bwd_inner_microstep: 5225.24 | bwd_allreduce_microstep: 58.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3825 [2024-07-31 06:16:55,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.86 | bwd_microstep: 5062.06 | bwd_inner_microstep: 5041.84 | bwd_allreduce_microstep: 20.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3807 [2024-07-31 06:17:04,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.04 | bwd_microstep: 5139.69 | bwd_inner_microstep: 5094.03 | bwd_allreduce_microstep: 45.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-31 06:17:13,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.19 | bwd_microstep: 5202.80 | bwd_inner_microstep: 5117.97 | bwd_allreduce_microstep: 84.76 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 06:17:21,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.05 | bwd_microstep: 5229.25 | bwd_inner_microstep: 4821.03 | bwd_allreduce_microstep: 408.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 06:17:30,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.77 | bwd_microstep: 5162.77 | bwd_inner_microstep: 5079.43 | bwd_allreduce_microstep: 83.27 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3680 [2024-07-31 06:17:39,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.31 | bwd_microstep: 4865.02 | bwd_inner_microstep: 4845.65 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 06:17:48,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 06:17:48,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.12 | bwd_microstep: 5004.10 | bwd_inner_microstep: 4948.64 | bwd_allreduce_microstep: 55.39 | step_microstep: 182.74 [2024-07-31 06:17:48,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29061.32 | bwd: 40949.26 | bwd_inner: 40173.77 | bwd_allreduce: 775.01 | step: 183.33 17%|█▋ | 208/1230 [4:05:53<20:51:48, 73.49s/it] {'loss': 1.2044, 'learning_rate': 1.9003149509740347e-05, 'epoch': 0.17} 17%|█▋ | 208/1230 [4:05:53<20:51:48, 73.49s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3957 [2024-07-31 06:17:57,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.81 | bwd_microstep: 5511.15 | bwd_inner_microstep: 5417.97 | bwd_allreduce_microstep: 93.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3784 [2024-07-31 06:18:06,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3795.10 | bwd_microstep: 5285.27 | bwd_inner_microstep: 5236.45 | bwd_allreduce_microstep: 48.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-31 06:18:15,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.33 | bwd_microstep: 5102.57 | bwd_inner_microstep: 5052.85 | bwd_allreduce_microstep: 49.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 06:18:23,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.32 | bwd_microstep: 5104.82 | bwd_inner_microstep: 5028.61 | bwd_allreduce_microstep: 76.14 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2106 [2024-07-31 06:18:32,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.75 | bwd_microstep: 5164.81 | bwd_inner_microstep: 4762.97 | bwd_allreduce_microstep: 401.77 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-31 06:18:41,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.70 | bwd_microstep: 5162.96 | bwd_inner_microstep: 5108.64 | bwd_allreduce_microstep: 54.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-31 06:18:50,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.38 | bwd_microstep: 5058.69 | bwd_inner_microstep: 4990.43 | bwd_allreduce_microstep: 68.19 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3689 [2024-07-31 06:18:58,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-31 06:18:58,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.19 | bwd_microstep: 4839.19 | bwd_inner_microstep: 4819.87 | bwd_allreduce_microstep: 19.26 | step_microstep: 181.86 [2024-07-31 06:18:58,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29221.49 | bwd: 41229.44 | bwd_inner: 40417.72 | bwd_allreduce: 811.24 | step: 182.45 17%|█▋ | 209/1230 [4:07:04<20:36:46, 72.68s/it] {'loss': 1.2142, 'learning_rate': 1.899165691803601e-05, 'epoch': 0.17} 17%|█▋ | 209/1230 [4:07:04<20:36:46, 72.68s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3931 [2024-07-31 06:19:07,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.90 | bwd_microstep: 5312.36 | bwd_inner_microstep: 5259.16 | bwd_allreduce_microstep: 53.13 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3844 [2024-07-31 06:19:16,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.40 | bwd_microstep: 5395.71 | bwd_inner_microstep: 5325.76 | bwd_allreduce_microstep: 69.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3812 [2024-07-31 06:19:25,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.31 | bwd_microstep: 5044.00 | bwd_inner_microstep: 5024.55 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-31 06:19:33,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3253.64 | bwd_microstep: 4883.43 | bwd_inner_microstep: 4853.80 | bwd_allreduce_microstep: 29.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 06:19:42,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.99 | bwd_microstep: 5021.93 | bwd_inner_microstep: 4997.05 | bwd_allreduce_microstep: 24.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 06:19:51,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.54 | bwd_microstep: 5009.51 | bwd_inner_microstep: 4953.64 | bwd_allreduce_microstep: 55.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 06:20:00,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.27 | bwd_microstep: 4988.07 | bwd_inner_microstep: 4967.69 | bwd_allreduce_microstep: 20.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-31 06:20:08,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 06:20:08,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3192.61 | bwd_microstep: 4739.96 | bwd_inner_microstep: 4713.84 | bwd_allreduce_microstep: 26.05 | step_microstep: 182.10 [2024-07-31 06:20:08,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28601.58 | bwd: 40394.95 | bwd_inner: 40095.41 | bwd_allreduce: 299.05 | step: 182.69 17%|█▋ | 210/1230 [4:08:14<20:18:29, 71.68s/it] {'loss': 1.2214, 'learning_rate': 1.8980101973190787e-05, 'epoch': 0.17} 17%|█▋ | 210/1230 [4:08:14<20:18:29, 71.68s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3951 [2024-07-31 06:20:17,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3811.06 | bwd_microstep: 5199.26 | bwd_inner_microstep: 5180.15 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3797 [2024-07-31 06:20:26,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3658.36 | bwd_microstep: 5273.53 | bwd_inner_microstep: 5195.23 | bwd_allreduce_microstep: 78.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3770 [2024-07-31 06:20:34,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.77 | bwd_microstep: 5012.98 | bwd_inner_microstep: 4993.55 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 06:20:42,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3179.23 | bwd_microstep: 4863.76 | bwd_inner_microstep: 4819.59 | bwd_allreduce_microstep: 44.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 06:20:51,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.85 | bwd_microstep: 4977.46 | bwd_inner_microstep: 4958.02 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.09 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1083 [2024-07-31 06:21:00,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.01 | bwd_microstep: 5318.94 | bwd_inner_microstep: 4907.12 | bwd_allreduce_microstep: 411.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-31 06:21:09,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.53 | bwd_microstep: 4872.14 | bwd_inner_microstep: 4852.84 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 06:21:17,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 06:21:17,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.94 | bwd_microstep: 5122.21 | bwd_inner_microstep: 5056.68 | bwd_allreduce_microstep: 65.47 | step_microstep: 181.86 [2024-07-31 06:21:17,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28846.66 | bwd: 40640.27 | bwd_inner: 39963.13 | bwd_allreduce: 676.64 | step: 182.46 17%|█▋ | 211/1230 [4:09:23<20:07:49, 71.12s/it] {'loss': 1.2449, 'learning_rate': 1.896848475533309e-05, 'epoch': 0.17} 17%|█▋ | 211/1230 [4:09:23<20:07:49, 71.12s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3546 [2024-07-31 06:21:26,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.54 | bwd_microstep: 5086.09 | bwd_inner_microstep: 5008.13 | bwd_allreduce_microstep: 77.85 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2332 [2024-07-31 06:21:35,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.24 | bwd_microstep: 5371.75 | bwd_inner_microstep: 4956.99 | bwd_allreduce_microstep: 414.69 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2055 [2024-07-31 06:21:44,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.18 | bwd_microstep: 5242.82 | bwd_inner_microstep: 4834.43 | bwd_allreduce_microstep: 408.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-31 06:21:52,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.04 | bwd_microstep: 5161.43 | bwd_inner_microstep: 5101.10 | bwd_allreduce_microstep: 60.26 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 06:22:01,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.24 | bwd_microstep: 5222.08 | bwd_inner_microstep: 4818.13 | bwd_allreduce_microstep: 403.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 06:22:10,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.83 | bwd_microstep: 5157.89 | bwd_inner_microstep: 5104.72 | bwd_allreduce_microstep: 53.10 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 06:22:19,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.86 | bwd_microstep: 5063.13 | bwd_inner_microstep: 4997.65 | bwd_allreduce_microstep: 65.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3682 [2024-07-31 06:22:27,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 06:22:27,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.52 | bwd_microstep: 4871.77 | bwd_inner_microstep: 4852.48 | bwd_allreduce_microstep: 19.22 | step_microstep: 181.88 [2024-07-31 06:22:27,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28442.35 | bwd: 41176.93 | bwd_inner: 39673.59 | bwd_allreduce: 1502.84 | step: 182.59 17%|█▋ | 212/1230 [4:10:33<20:00:43, 70.77s/it] {'loss': 1.1548, 'learning_rate': 1.8956805345023145e-05, 'epoch': 0.17} 17%|█▋ | 212/1230 [4:10:33<20:00:43, 70.77s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2265 [2024-07-31 06:22:36,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.95 | bwd_microstep: 5336.71 | bwd_inner_microstep: 4925.83 | bwd_allreduce_microstep: 410.81 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2069 [2024-07-31 06:22:45,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.69 | bwd_microstep: 5446.00 | bwd_inner_microstep: 5023.80 | bwd_allreduce_microstep: 422.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3783 [2024-07-31 06:22:54,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.97 | bwd_microstep: 5019.69 | bwd_inner_microstep: 5000.34 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3770 [2024-07-31 06:23:03,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.80 | bwd_microstep: 5179.30 | bwd_inner_microstep: 5099.90 | bwd_allreduce_microstep: 79.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-31 06:23:12,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.34 | bwd_microstep: 5177.70 | bwd_inner_microstep: 4774.93 | bwd_allreduce_microstep: 402.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 06:23:21,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.28 | bwd_microstep: 5183.80 | bwd_inner_microstep: 4781.10 | bwd_allreduce_microstep: 402.63 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 06:23:29,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.44 | bwd_microstep: 5141.33 | bwd_inner_microstep: 5068.94 | bwd_allreduce_microstep: 72.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 06:23:38,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 06:23:38,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.10 | bwd_microstep: 5072.29 | bwd_inner_microstep: 5012.67 | bwd_allreduce_microstep: 59.55 | step_microstep: 181.73 [2024-07-31 06:23:38,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28797.47 | bwd: 41556.81 | bwd_inner: 39687.45 | bwd_allreduce: 1868.87 | step: 182.33 17%|█▋ | 213/1230 [4:11:44<19:59:06, 70.74s/it] {'loss': 1.1714, 'learning_rate': 1.894506382325248e-05, 'epoch': 0.17} 17%|█▋ | 213/1230 [4:11:44<19:59:06, 70.74s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4060 [2024-07-31 06:23:47,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3869.17 | bwd_microstep: 5352.02 | bwd_inner_microstep: 5332.94 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3929 [2024-07-31 06:23:56,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3804.01 | bwd_microstep: 5160.64 | bwd_inner_microstep: 5141.29 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2289 [2024-07-31 06:24:05,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.75 | bwd_microstep: 5119.43 | bwd_inner_microstep: 4720.32 | bwd_allreduce_microstep: 399.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 06:24:14,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.93 | bwd_microstep: 5205.87 | bwd_inner_microstep: 5120.01 | bwd_allreduce_microstep: 85.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 06:24:23,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.35 | bwd_microstep: 5238.09 | bwd_inner_microstep: 4828.52 | bwd_allreduce_microstep: 409.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 06:24:31,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.67 | bwd_microstep: 4889.30 | bwd_inner_microstep: 4870.02 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3729 [2024-07-31 06:24:39,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3121.75 | bwd_microstep: 4953.91 | bwd_inner_microstep: 4913.77 | bwd_allreduce_microstep: 40.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2134 [2024-07-31 06:24:48,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 06:24:48,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.71 | bwd_microstep: 5106.39 | bwd_inner_microstep: 4709.92 | bwd_allreduce_microstep: 396.40 | step_microstep: 186.01 [2024-07-31 06:24:48,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28677.23 | bwd: 41025.63 | bwd_inner: 39636.73 | bwd_allreduce: 1388.42 | step: 186.62 17%|█▋ | 214/1230 [4:12:54<19:54:23, 70.53s/it] {'loss': 1.1764, 'learning_rate': 1.8933260271443313e-05, 'epoch': 0.17} 17%|█▋ | 214/1230 [4:12:54<19:54:23, 70.53s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2350 [2024-07-31 06:24:57,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.59 | bwd_microstep: 5341.97 | bwd_inner_microstep: 4929.38 | bwd_allreduce_microstep: 412.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3583 [2024-07-31 06:25:06,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.34 | bwd_microstep: 5115.29 | bwd_inner_microstep: 5040.35 | bwd_allreduce_microstep: 74.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3834 [2024-07-31 06:25:15,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.88 | bwd_microstep: 5117.98 | bwd_inner_microstep: 5077.47 | bwd_allreduce_microstep: 40.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 06:25:23,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.91 | bwd_microstep: 5175.28 | bwd_inner_microstep: 5118.27 | bwd_allreduce_microstep: 56.95 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2080 [2024-07-31 06:25:32,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.23 | bwd_microstep: 5224.44 | bwd_inner_microstep: 4819.50 | bwd_allreduce_microstep: 404.87 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2222 [2024-07-31 06:25:41,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.84 | bwd_microstep: 5103.94 | bwd_inner_microstep: 4708.01 | bwd_allreduce_microstep: 395.86 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2151 [2024-07-31 06:25:49,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.32 | bwd_microstep: 5146.04 | bwd_inner_microstep: 4744.23 | bwd_allreduce_microstep: 401.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 06:25:58,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 06:25:58,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.26 | bwd_microstep: 5075.60 | bwd_inner_microstep: 5032.57 | bwd_allreduce_microstep: 42.96 | step_microstep: 181.78 [2024-07-31 06:25:58,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28518.27 | bwd: 41300.52 | bwd_inner: 39469.72 | bwd_allreduce: 1830.31 | step: 182.37 17%|█▋ | 215/1230 [4:14:04<19:51:14, 70.42s/it] {'loss': 1.2932, 'learning_rate': 1.8921394771448032e-05, 'epoch': 0.17} 17%|█▋ | 215/1230 [4:14:04<19:51:14, 70.42s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3912 [2024-07-31 06:26:07,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.28 | bwd_microstep: 5203.84 | bwd_inner_microstep: 5157.40 | bwd_allreduce_microstep: 46.37 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3850 [2024-07-31 06:26:16,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.88 | bwd_microstep: 5177.27 | bwd_inner_microstep: 5141.01 | bwd_allreduce_microstep: 36.19 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3912 [2024-07-31 06:26:25,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3821.00 | bwd_microstep: 5164.72 | bwd_inner_microstep: 5144.55 | bwd_allreduce_microstep: 20.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-31 06:26:34,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.48 | bwd_microstep: 5291.05 | bwd_inner_microstep: 4880.24 | bwd_allreduce_microstep: 410.74 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2257 [2024-07-31 06:26:43,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.64 | bwd_microstep: 5174.43 | bwd_inner_microstep: 4770.62 | bwd_allreduce_microstep: 403.74 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2099 [2024-07-31 06:26:51,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.77 | bwd_microstep: 5205.58 | bwd_inner_microstep: 4800.14 | bwd_allreduce_microstep: 405.37 | step_microstep: 0.19 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3722 [2024-07-31 06:27:00,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.04 | bwd_microstep: 5099.26 | bwd_inner_microstep: 5034.98 | bwd_allreduce_microstep: 64.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-31 06:27:09,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 06:27:09,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.49 | bwd_microstep: 5046.05 | bwd_inner_microstep: 4988.17 | bwd_allreduce_microstep: 57.82 | step_microstep: 182.16 [2024-07-31 06:27:09,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28964.47 | bwd: 41362.19 | bwd_inner: 39917.05 | bwd_allreduce: 1444.65 | step: 182.85 18%|█▊ | 216/1230 [4:15:15<19:51:17, 70.49s/it] {'loss': 1.2256, 'learning_rate': 1.89094674055486e-05, 'epoch': 0.18} 18%|█▊ | 216/1230 [4:15:15<19:51:17, 70.49s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3942 [2024-07-31 06:27:18,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.95 | bwd_microstep: 5147.62 | bwd_inner_microstep: 5121.66 | bwd_allreduce_microstep: 25.89 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2252 [2024-07-31 06:27:27,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.79 | bwd_microstep: 5296.02 | bwd_inner_microstep: 4885.32 | bwd_allreduce_microstep: 410.63 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2289 [2024-07-31 06:27:35,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.66 | bwd_microstep: 5153.81 | bwd_inner_microstep: 4753.18 | bwd_allreduce_microstep: 400.57 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3599 [2024-07-31 06:27:44,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.96 | bwd_microstep: 5176.77 | bwd_inner_microstep: 5072.32 | bwd_allreduce_microstep: 104.39 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 06:27:53,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.41 | bwd_microstep: 4891.76 | bwd_inner_microstep: 4872.43 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 06:28:01,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.73 | bwd_microstep: 4906.54 | bwd_inner_microstep: 4887.10 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-31 06:28:10,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.20 | bwd_microstep: 5038.17 | bwd_inner_microstep: 4984.28 | bwd_allreduce_microstep: 53.82 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2136 [2024-07-31 06:28:19,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 06:28:19,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.82 | bwd_microstep: 5232.92 | bwd_inner_microstep: 4824.39 | bwd_allreduce_microstep: 408.46 | step_microstep: 181.45 [2024-07-31 06:28:19,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28780.41 | bwd: 40843.60 | bwd_inner: 39400.62 | bwd_allreduce: 1442.48 | step: 182.05 18%|█▊ | 217/1230 [4:16:25<19:47:25, 70.33s/it] {'loss': 1.2403, 'learning_rate': 1.889747825645599e-05, 'epoch': 0.18} 18%|█▊ | 217/1230 [4:16:25<19:47:25, 70.33s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 06:28:28,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3834.85 | bwd_microstep: 5346.09 | bwd_inner_microstep: 5326.98 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3840 [2024-07-31 06:28:37,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.78 | bwd_microstep: 5308.00 | bwd_inner_microstep: 5244.38 | bwd_allreduce_microstep: 63.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3916 [2024-07-31 06:28:46,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.25 | bwd_microstep: 5074.49 | bwd_inner_microstep: 5046.42 | bwd_allreduce_microstep: 28.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3772 [2024-07-31 06:28:55,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.05 | bwd_microstep: 5011.68 | bwd_inner_microstep: 4992.34 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-31 06:29:03,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.90 | bwd_microstep: 4938.66 | bwd_inner_microstep: 4906.00 | bwd_allreduce_microstep: 32.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-31 06:29:11,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3057.05 | bwd_microstep: 5041.21 | bwd_inner_microstep: 4653.64 | bwd_allreduce_microstep: 387.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 06:29:20,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.57 | bwd_microstep: 5141.21 | bwd_inner_microstep: 4740.28 | bwd_allreduce_microstep: 400.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 06:29:29,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 06:29:29,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.24 | bwd_microstep: 4908.88 | bwd_inner_microstep: 4889.51 | bwd_allreduce_microstep: 19.31 | step_microstep: 182.52 [2024-07-31 06:29:29,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28732.60 | bwd: 40770.21 | bwd_inner: 39799.49 | bwd_allreduce: 970.23 | step: 183.10 18%|█▊ | 218/1230 [4:17:35<19:43:45, 70.18s/it] {'loss': 1.2661, 'learning_rate': 1.8885427407309627e-05, 'epoch': 0.18} 18%|█▊ | 218/1230 [4:17:35<19:43:45, 70.18s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3951 [2024-07-31 06:29:38,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3850.44 | bwd_microstep: 5202.15 | bwd_inner_microstep: 5175.15 | bwd_allreduce_microstep: 26.93 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3819 [2024-07-31 06:29:47,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.42 | bwd_microstep: 5107.87 | bwd_inner_microstep: 5065.77 | bwd_allreduce_microstep: 42.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2285 [2024-07-31 06:29:55,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.41 | bwd_microstep: 5125.95 | bwd_inner_microstep: 4726.43 | bwd_allreduce_microstep: 399.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-31 06:30:04,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.70 | bwd_microstep: 5232.44 | bwd_inner_microstep: 4826.30 | bwd_allreduce_microstep: 406.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 06:30:13,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.91 | bwd_microstep: 5046.87 | bwd_inner_microstep: 5015.15 | bwd_allreduce_microstep: 31.64 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-31 06:30:22,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.50 | bwd_microstep: 4983.05 | bwd_inner_microstep: 4963.69 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1136 [2024-07-31 06:30:30,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.39 | bwd_microstep: 5141.82 | bwd_inner_microstep: 4744.35 | bwd_allreduce_microstep: 397.40 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 06:30:39,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 06:30:39,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.59 | bwd_microstep: 4973.73 | bwd_inner_microstep: 4919.21 | bwd_allreduce_microstep: 54.45 | step_microstep: 181.60 [2024-07-31 06:30:39,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29042.26 | bwd: 40813.85 | bwd_inner: 39436.00 | bwd_allreduce: 1377.36 | step: 182.21 18%|█▊ | 219/1230 [4:18:45<19:42:38, 70.19s/it] {'loss': 1.1906, 'learning_rate': 1.887331494167678e-05, 'epoch': 0.18} 18%|█▊ | 219/1230 [4:18:45<19:42:38, 70.19s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3920 [2024-07-31 06:30:48,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3707.15 | bwd_microstep: 5483.02 | bwd_inner_microstep: 5402.94 | bwd_allreduce_microstep: 80.01 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2359 [2024-07-31 06:30:57,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.47 | bwd_microstep: 5192.25 | bwd_inner_microstep: 4787.77 | bwd_allreduce_microstep: 404.41 | step_microstep: 0.18 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3640 [2024-07-31 06:31:06,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.69 | bwd_microstep: 5209.24 | bwd_inner_microstep: 5113.35 | bwd_allreduce_microstep: 95.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-31 06:31:15,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.92 | bwd_microstep: 4992.53 | bwd_inner_microstep: 4972.50 | bwd_allreduce_microstep: 19.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3630 [2024-07-31 06:31:23,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3418.62 | bwd_microstep: 5083.83 | bwd_inner_microstep: 5000.43 | bwd_allreduce_microstep: 83.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-31 06:31:31,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3171.98 | bwd_microstep: 4706.80 | bwd_inner_microstep: 4683.25 | bwd_allreduce_microstep: 23.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 06:31:40,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.03 | bwd_microstep: 5119.23 | bwd_inner_microstep: 4723.92 | bwd_allreduce_microstep: 395.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 06:31:48,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 06:31:48,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.66 | bwd_microstep: 5061.87 | bwd_inner_microstep: 5003.40 | bwd_allreduce_microstep: 58.40 | step_microstep: 181.70 [2024-07-31 06:31:48,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28308.41 | bwd: 40848.77 | bwd_inner: 39687.50 | bwd_allreduce: 1160.78 | step: 182.39 18%|█▊ | 220/1230 [4:19:54<19:37:56, 69.98s/it] {'loss': 1.2643, 'learning_rate': 1.8861140943552014e-05, 'epoch': 0.18} 18%|█▊ | 220/1230 [4:19:54<19:37:56, 69.98s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2329 [2024-07-31 06:31:57,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.22 | bwd_microstep: 5162.99 | bwd_inner_microstep: 4764.27 | bwd_allreduce_microstep: 398.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 06:32:06,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.13 | bwd_microstep: 5164.24 | bwd_inner_microstep: 5123.86 | bwd_allreduce_microstep: 40.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-31 06:32:15,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3778.76 | bwd_microstep: 5004.73 | bwd_inner_microstep: 4985.39 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 06:32:23,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3328.00 | bwd_microstep: 4775.01 | bwd_inner_microstep: 4748.04 | bwd_allreduce_microstep: 26.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 06:32:32,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.62 | bwd_microstep: 4992.07 | bwd_inner_microstep: 4972.78 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 06:32:40,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3447.78 | bwd_microstep: 5013.82 | bwd_inner_microstep: 4625.49 | bwd_allreduce_microstep: 388.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-31 06:32:49,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.58 | bwd_microstep: 5138.04 | bwd_inner_microstep: 4739.13 | bwd_allreduce_microstep: 398.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 06:32:57,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 06:32:57,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3027.92 | bwd_microstep: 4929.85 | bwd_inner_microstep: 4553.04 | bwd_allreduce_microstep: 376.74 | step_microstep: 181.40 [2024-07-31 06:32:57,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28105.92 | bwd: 40180.74 | bwd_inner: 38511.95 | bwd_allreduce: 1668.30 | step: 181.98 18%|█▊ | 221/1230 [4:21:03<19:29:55, 69.57s/it] {'loss': 1.209, 'learning_rate': 1.884890549735659e-05, 'epoch': 0.18} 18%|█▊ | 221/1230 [4:21:03<19:29:55, 69.57s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3914 [2024-07-31 06:33:06,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.45 | bwd_microstep: 5571.36 | bwd_inner_microstep: 5474.47 | bwd_allreduce_microstep: 96.82 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3859 [2024-07-31 06:33:15,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.57 | bwd_microstep: 5355.13 | bwd_inner_microstep: 5300.52 | bwd_allreduce_microstep: 54.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 06:33:24,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.18 | bwd_microstep: 5084.17 | bwd_inner_microstep: 5040.55 | bwd_allreduce_microstep: 43.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3878 [2024-07-31 06:33:33,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.02 | bwd_microstep: 5126.25 | bwd_inner_microstep: 5106.95 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-31 06:33:42,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.50 | bwd_microstep: 5020.62 | bwd_inner_microstep: 4998.22 | bwd_allreduce_microstep: 22.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 06:33:51,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.83 | bwd_microstep: 4999.53 | bwd_inner_microstep: 4980.13 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 06:33:59,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.18 | bwd_microstep: 5026.72 | bwd_inner_microstep: 5001.43 | bwd_allreduce_microstep: 25.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 06:34:08,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 06:34:08,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.39 | bwd_microstep: 4914.20 | bwd_inner_microstep: 4891.19 | bwd_allreduce_microstep: 22.94 | step_microstep: 181.97 [2024-07-31 06:34:08,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29704.03 | bwd: 41097.95 | bwd_inner: 40793.41 | bwd_allreduce: 304.05 | step: 182.56 18%|█▊ | 222/1230 [4:22:14<19:36:40, 70.04s/it] {'loss': 1.1859, 'learning_rate': 1.8836608687937883e-05, 'epoch': 0.18} 18%|█▊ | 222/1230 [4:22:14<19:36:40, 70.04s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3896 [2024-07-31 06:34:17,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.57 | bwd_microstep: 5218.53 | bwd_inner_microstep: 5175.04 | bwd_allreduce_microstep: 43.42 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2305 [2024-07-31 06:34:26,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.64 | bwd_microstep: 5202.94 | bwd_inner_microstep: 4797.72 | bwd_allreduce_microstep: 405.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3789 [2024-07-31 06:34:35,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.26 | bwd_microstep: 5060.30 | bwd_inner_microstep: 5039.07 | bwd_allreduce_microstep: 21.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3879 [2024-07-31 06:34:44,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.09 | bwd_microstep: 5180.22 | bwd_inner_microstep: 5133.31 | bwd_allreduce_microstep: 46.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-31 06:34:52,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.04 | bwd_microstep: 5175.10 | bwd_inner_microstep: 5118.85 | bwd_allreduce_microstep: 56.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3666 [2024-07-31 06:35:01,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.21 | bwd_microstep: 5210.25 | bwd_inner_microstep: 5117.63 | bwd_allreduce_microstep: 92.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-31 06:35:09,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.30 | bwd_microstep: 4833.98 | bwd_inner_microstep: 4793.03 | bwd_allreduce_microstep: 40.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 06:35:18,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 06:35:18,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.25 | bwd_microstep: 5181.32 | bwd_inner_microstep: 4778.50 | bwd_allreduce_microstep: 402.75 | step_microstep: 181.52 [2024-07-31 06:35:18,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28553.27 | bwd: 41062.62 | bwd_inner: 39953.08 | bwd_allreduce: 1109.06 | step: 182.09 18%|█▊ | 223/1230 [4:23:24<19:35:02, 70.01s/it] {'loss': 1.1723, 'learning_rate': 1.8824250600568798e-05, 'epoch': 0.18} 18%|█▊ | 223/1230 [4:23:24<19:35:02, 70.01s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3809 [2024-07-31 06:35:27,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.89 | bwd_microstep: 5053.15 | bwd_inner_microstep: 5033.96 | bwd_allreduce_microstep: 19.12 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3876 [2024-07-31 06:35:36,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3798.84 | bwd_microstep: 5117.19 | bwd_inner_microstep: 5097.82 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3590 [2024-07-31 06:35:45,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.03 | bwd_microstep: 5135.50 | bwd_inner_microstep: 5039.46 | bwd_allreduce_microstep: 95.96 | step_microstep: 0.19 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1213 [2024-07-31 06:35:53,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.90 | bwd_microstep: 5284.80 | bwd_inner_microstep: 4877.42 | bwd_allreduce_microstep: 407.31 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2070 [2024-07-31 06:36:02,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.19 | bwd_microstep: 5177.02 | bwd_inner_microstep: 4775.42 | bwd_allreduce_microstep: 401.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 06:36:11,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.18 | bwd_microstep: 4977.55 | bwd_inner_microstep: 4958.14 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-31 06:36:20,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.10 | bwd_microstep: 4989.09 | bwd_inner_microstep: 4969.69 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 06:36:29,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 06:36:29,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.85 | bwd_microstep: 5020.58 | bwd_inner_microstep: 4998.04 | bwd_allreduce_microstep: 22.47 | step_microstep: 181.28 [2024-07-31 06:36:29,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29370.90 | bwd: 40754.84 | bwd_inner: 39749.89 | bwd_allreduce: 1004.44 | step: 182.00 18%|█▊ | 224/1230 [4:24:35<19:36:06, 70.15s/it] {'loss': 1.2263, 'learning_rate': 1.8811831320947174e-05, 'epoch': 0.18} 18%|█▊ | 224/1230 [4:24:35<19:36:06, 70.15s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2418 [2024-07-31 06:36:38,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.90 | bwd_microstep: 5372.19 | bwd_inner_microstep: 4958.13 | bwd_allreduce_microstep: 413.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3569 [2024-07-31 06:36:46,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.30 | bwd_microstep: 5070.38 | bwd_inner_microstep: 4998.33 | bwd_allreduce_microstep: 71.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-31 06:36:55,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.39 | bwd_microstep: 5173.15 | bwd_inner_microstep: 5099.21 | bwd_allreduce_microstep: 73.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 06:37:04,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.64 | bwd_microstep: 5180.07 | bwd_inner_microstep: 5102.24 | bwd_allreduce_microstep: 77.76 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2159 [2024-07-31 06:37:12,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3017.23 | bwd_microstep: 4881.09 | bwd_inner_microstep: 4504.96 | bwd_allreduce_microstep: 376.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3662 [2024-07-31 06:37:20,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.59 | bwd_microstep: 5050.28 | bwd_inner_microstep: 4979.30 | bwd_allreduce_microstep: 70.92 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2178 [2024-07-31 06:37:29,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.81 | bwd_microstep: 5110.96 | bwd_inner_microstep: 4714.80 | bwd_allreduce_microstep: 396.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 06:37:38,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.81 [2024-07-31 06:37:38,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.23 | bwd_microstep: 4916.54 | bwd_inner_microstep: 4893.20 | bwd_allreduce_microstep: 23.27 | step_microstep: 181.30 [2024-07-31 06:37:38,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28245.01 | bwd: 40754.65 | bwd_inner: 39250.13 | bwd_allreduce: 1504.05 | step: 181.89 18%|█▊ | 225/1230 [4:25:44<19:30:49, 69.90s/it] {'loss': 1.2086, 'learning_rate': 1.879935093519519e-05, 'epoch': 0.18} 18%|█▊ | 225/1230 [4:25:44<19:30:49, 69.90s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3908 [2024-07-31 06:37:47,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.22 | bwd_microstep: 5112.16 | bwd_inner_microstep: 5083.29 | bwd_allreduce_microstep: 28.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 06:37:56,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.66 | bwd_microstep: 5156.54 | bwd_inner_microstep: 5081.07 | bwd_allreduce_microstep: 75.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-31 06:38:04,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.42 | bwd_microstep: 5296.51 | bwd_inner_microstep: 5203.25 | bwd_allreduce_microstep: 93.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-31 06:38:13,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.13 | bwd_microstep: 5155.16 | bwd_inner_microstep: 5103.37 | bwd_allreduce_microstep: 51.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 06:38:22,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.43 | bwd_microstep: 5197.12 | bwd_inner_microstep: 5115.57 | bwd_allreduce_microstep: 81.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 06:38:31,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.87 | bwd_microstep: 5068.92 | bwd_inner_microstep: 5004.92 | bwd_allreduce_microstep: 63.93 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 06:38:39,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.86 | bwd_microstep: 5064.79 | bwd_inner_microstep: 5005.17 | bwd_allreduce_microstep: 59.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 06:38:48,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 06:38:48,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.52 | bwd_microstep: 5040.64 | bwd_inner_microstep: 4649.12 | bwd_allreduce_microstep: 391.44 | step_microstep: 186.14 [2024-07-31 06:38:48,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28794.03 | bwd: 41091.82 | bwd_inner: 40245.69 | bwd_allreduce: 845.64 | step: 186.77 18%|█▊ | 226/1230 [4:26:54<19:31:20, 70.00s/it] {'loss': 1.2279, 'learning_rate': 1.8786809529858766e-05, 'epoch': 0.18} 18%|█▊ | 226/1230 [4:26:54<19:31:20, 70.00s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3911 [2024-07-31 06:38:57,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.68 | bwd_microstep: 5006.00 | bwd_inner_microstep: 4986.93 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3820 [2024-07-31 06:39:05,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.95 | bwd_microstep: 5086.57 | bwd_inner_microstep: 5047.92 | bwd_allreduce_microstep: 38.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3848 [2024-07-31 06:39:14,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.43 | bwd_microstep: 5136.94 | bwd_inner_microstep: 5113.36 | bwd_allreduce_microstep: 23.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 06:39:23,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.90 | bwd_microstep: 4986.52 | bwd_inner_microstep: 4967.08 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 06:39:32,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.66 | bwd_microstep: 5104.57 | bwd_inner_microstep: 5054.02 | bwd_allreduce_microstep: 50.48 | step_microstep: 0.20 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 06:39:40,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.22 | bwd_microstep: 5043.19 | bwd_inner_microstep: 4986.82 | bwd_allreduce_microstep: 56.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-31 06:39:49,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.29 | bwd_microstep: 5025.50 | bwd_inner_microstep: 4989.33 | bwd_allreduce_microstep: 36.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 06:39:58,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-31 06:39:58,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.34 | bwd_microstep: 4982.47 | bwd_inner_microstep: 4931.18 | bwd_allreduce_microstep: 51.23 | step_microstep: 181.64 [2024-07-31 06:39:58,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29028.37 | bwd: 40371.76 | bwd_inner: 40076.58 | bwd_allreduce: 294.69 | step: 182.35 18%|█▊ | 227/1230 [4:28:04<19:28:50, 69.92s/it] {'loss': 1.1943, 'learning_rate': 1.8774207191906976e-05, 'epoch': 0.18} 18%|█▊ | 227/1230 [4:28:04<19:28:50, 69.92s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 06:40:07,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3878.13 | bwd_microstep: 5370.34 | bwd_inner_microstep: 5347.26 | bwd_allreduce_microstep: 23.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-07-31 06:40:16,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.58 | bwd_microstep: 5238.46 | bwd_inner_microstep: 5178.97 | bwd_allreduce_microstep: 59.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 06:40:25,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.04 | bwd_microstep: 5115.54 | bwd_inner_microstep: 5044.76 | bwd_allreduce_microstep: 70.72 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 06:40:34,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.65 | bwd_microstep: 5190.69 | bwd_inner_microstep: 5108.25 | bwd_allreduce_microstep: 82.37 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2219 [2024-07-31 06:40:42,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3055.94 | bwd_microstep: 5037.64 | bwd_inner_microstep: 4650.07 | bwd_allreduce_microstep: 387.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-31 06:40:50,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3207.06 | bwd_microstep: 4782.22 | bwd_inner_microstep: 4762.93 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 06:40:58,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.01 | bwd_microstep: 5124.27 | bwd_inner_microstep: 5054.71 | bwd_allreduce_microstep: 69.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 06:41:07,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 06:41:07,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.13 | bwd_microstep: 5169.69 | bwd_inner_microstep: 5094.77 | bwd_allreduce_microstep: 74.85 | step_microstep: 181.45 [2024-07-31 06:41:07,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28089.45 | bwd: 41028.82 | bwd_inner: 40241.65 | bwd_allreduce: 786.69 | step: 182.05 19%|█▊ | 228/1230 [4:29:13<19:25:19, 69.78s/it] {'loss': 1.1616, 'learning_rate': 1.8761544008731426e-05, 'epoch': 0.19} 19%|█▊ | 228/1230 [4:29:13<19:25:19, 69.78s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 06:41:17,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3895.36 | bwd_microstep: 5396.39 | bwd_inner_microstep: 5377.36 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 06:41:26,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.92 | bwd_microstep: 5277.95 | bwd_inner_microstep: 5187.07 | bwd_allreduce_microstep: 90.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3805 [2024-07-31 06:41:35,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.66 | bwd_microstep: 5263.33 | bwd_inner_microstep: 5201.33 | bwd_allreduce_microstep: 61.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3896 [2024-07-31 06:41:43,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.16 | bwd_microstep: 5157.96 | bwd_inner_microstep: 5112.76 | bwd_allreduce_microstep: 45.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 06:41:52,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.14 | bwd_microstep: 5130.85 | bwd_inner_microstep: 5077.98 | bwd_allreduce_microstep: 52.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-31 06:42:01,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.61 | bwd_microstep: 5044.57 | bwd_inner_microstep: 5003.50 | bwd_allreduce_microstep: 41.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-31 06:42:10,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.74 | bwd_microstep: 4932.54 | bwd_inner_microstep: 4908.76 | bwd_allreduce_microstep: 23.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 06:42:18,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 06:42:18,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3244.42 | bwd_microstep: 4788.35 | bwd_inner_microstep: 4769.01 | bwd_allreduce_microstep: 19.27 | step_microstep: 183.13 [2024-07-31 06:42:18,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29152.91 | bwd: 40991.93 | bwd_inner: 40637.72 | bwd_allreduce: 353.73 | step: 183.69 19%|█▊ | 229/1230 [4:30:24<19:27:40, 69.99s/it] {'loss': 1.1972, 'learning_rate': 1.874882006814565e-05, 'epoch': 0.19} 19%|█▊ | 229/1230 [4:30:24<19:27:40, 69.99s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 06:42:27,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3891.70 | bwd_microstep: 5356.86 | bwd_inner_microstep: 5337.83 | bwd_allreduce_microstep: 18.96 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2827 [2024-07-31 06:42:36,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.06 | bwd_microstep: 5297.12 | bwd_inner_microstep: 4885.53 | bwd_allreduce_microstep: 411.51 | step_microstep: 0.12 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2247 [2024-07-31 06:42:45,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.12 | bwd_microstep: 5220.32 | bwd_inner_microstep: 4815.82 | bwd_allreduce_microstep: 404.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-07-31 06:42:54,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.82 | bwd_microstep: 5004.49 | bwd_inner_microstep: 4985.13 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3728 [2024-07-31 06:43:02,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.59 | bwd_microstep: 5182.57 | bwd_inner_microstep: 5114.55 | bwd_allreduce_microstep: 67.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 06:43:10,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3197.84 | bwd_microstep: 4712.81 | bwd_inner_microstep: 4691.23 | bwd_allreduce_microstep: 21.51 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 06:43:19,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.08 | bwd_microstep: 4988.94 | bwd_inner_microstep: 4940.67 | bwd_allreduce_microstep: 48.20 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3172 [2024-07-31 06:43:28,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 06:43:28,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.02 | bwd_microstep: 5015.31 | bwd_inner_microstep: 4834.50 | bwd_allreduce_microstep: 180.74 | step_microstep: 182.48 [2024-07-31 06:43:28,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28635.15 | bwd: 40778.40 | bwd_inner: 39605.20 | bwd_allreduce: 1172.70 | step: 183.10 19%|█▊ | 230/1230 [4:31:33<19:25:19, 69.92s/it] {'loss': 1.2034, 'learning_rate': 1.8736035458384528e-05, 'epoch': 0.19} 19%|█▊ | 230/1230 [4:31:33<19:25:19, 69.92s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-31 06:43:37,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.18 | bwd_microstep: 5315.67 | bwd_inner_microstep: 5220.89 | bwd_allreduce_microstep: 94.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3730 [2024-07-31 06:43:46,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.91 | bwd_microstep: 5246.11 | bwd_inner_microstep: 5149.64 | bwd_allreduce_microstep: 96.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3585 [2024-07-31 06:43:54,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.76 | bwd_microstep: 5215.81 | bwd_inner_microstep: 5114.78 | bwd_allreduce_microstep: 100.96 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-31 06:44:03,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3269.34 | bwd_microstep: 5021.42 | bwd_inner_microstep: 4980.59 | bwd_allreduce_microstep: 40.77 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3728 [2024-07-31 06:44:11,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.38 | bwd_microstep: 4977.78 | bwd_inner_microstep: 4955.73 | bwd_allreduce_microstep: 21.98 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2088 [2024-07-31 06:44:20,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.49 | bwd_microstep: 5203.00 | bwd_inner_microstep: 4797.05 | bwd_allreduce_microstep: 405.88 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 06:44:29,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.30 | bwd_microstep: 4876.75 | bwd_inner_microstep: 4857.50 | bwd_allreduce_microstep: 19.18 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2148 [2024-07-31 06:44:37,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 06:44:37,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3012.72 | bwd_microstep: 4891.29 | bwd_inner_microstep: 4515.56 | bwd_allreduce_microstep: 375.66 | step_microstep: 183.13 [2024-07-31 06:44:37,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28102.99 | bwd: 40747.83 | bwd_inner: 39591.67 | bwd_allreduce: 1155.65 | step: 183.85 19%|█▉ | 231/1230 [4:32:43<19:20:28, 69.70s/it] {'loss': 1.2238, 'learning_rate': 1.8723190268103634e-05, 'epoch': 0.19} 19%|█▉ | 231/1230 [4:32:43<19:20:28, 69.70s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2382 [2024-07-31 06:44:46,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.72 | bwd_microstep: 5411.02 | bwd_inner_microstep: 4994.92 | bwd_allreduce_microstep: 416.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3801 [2024-07-31 06:44:55,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.32 | bwd_microstep: 5030.86 | bwd_inner_microstep: 5011.46 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3598 [2024-07-31 06:45:03,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.45 | bwd_microstep: 5070.94 | bwd_inner_microstep: 5006.37 | bwd_allreduce_microstep: 64.50 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3606 [2024-07-31 06:45:12,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.56 | bwd_microstep: 5228.79 | bwd_inner_microstep: 5138.35 | bwd_allreduce_microstep: 90.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 06:45:20,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3218.53 | bwd_microstep: 4864.00 | bwd_inner_microstep: 4821.06 | bwd_allreduce_microstep: 42.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-31 06:45:29,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.19 | bwd_microstep: 5070.41 | bwd_inner_microstep: 5042.40 | bwd_allreduce_microstep: 27.94 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3645 [2024-07-31 06:45:38,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.46 | bwd_microstep: 5021.32 | bwd_inner_microstep: 4952.29 | bwd_allreduce_microstep: 68.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3685 [2024-07-31 06:45:47,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-31 06:45:47,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.83 | bwd_microstep: 5239.84 | bwd_inner_microstep: 5134.91 | bwd_allreduce_microstep: 104.85 | step_microstep: 181.52 [2024-07-31 06:45:47,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28688.96 | bwd: 40937.16 | bwd_inner: 40101.70 | bwd_allreduce: 834.96 | step: 182.11 19%|█▉ | 232/1230 [4:33:53<19:20:37, 69.78s/it] {'loss': 1.1813, 'learning_rate': 1.8710284586378645e-05, 'epoch': 0.19} 19%|█▉ | 232/1230 [4:33:53<19:20:37, 69.78s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 06:45:56,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3877.60 | bwd_microstep: 5378.15 | bwd_inner_microstep: 5358.93 | bwd_allreduce_microstep: 19.14 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3601 [2024-07-31 06:46:05,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.33 | bwd_microstep: 5237.70 | bwd_inner_microstep: 5167.23 | bwd_allreduce_microstep: 70.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3606 [2024-07-31 06:46:14,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.49 | bwd_microstep: 5156.58 | bwd_inner_microstep: 5077.76 | bwd_allreduce_microstep: 78.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 06:46:22,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3303.16 | bwd_microstep: 5037.34 | bwd_inner_microstep: 4978.79 | bwd_allreduce_microstep: 58.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 06:46:31,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.08 | bwd_microstep: 4889.49 | bwd_inner_microstep: 4870.15 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 06:46:39,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.15 | bwd_microstep: 5004.69 | bwd_inner_microstep: 4973.00 | bwd_allreduce_microstep: 31.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-31 06:46:48,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.28 | bwd_microstep: 5108.26 | bwd_inner_microstep: 5061.78 | bwd_allreduce_microstep: 46.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 06:46:57,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 06:46:57,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3442.73 | bwd_microstep: 5020.87 | bwd_inner_microstep: 4632.90 | bwd_allreduce_microstep: 387.90 | step_microstep: 183.03 [2024-07-31 06:46:57,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28702.73 | bwd: 40833.07 | bwd_inner: 40120.48 | bwd_allreduce: 712.09 | step: 183.61 19%|█▉ | 233/1230 [4:35:03<19:19:55, 69.81s/it] {'loss': 1.2154, 'learning_rate': 1.8697318502704734e-05, 'epoch': 0.19} 19%|█▉ | 233/1230 [4:35:03<19:19:55, 69.81s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3562 [2024-07-31 06:47:05,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.67 | bwd_microstep: 5251.00 | bwd_inner_microstep: 5158.15 | bwd_allreduce_microstep: 92.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3789 [2024-07-31 06:47:14,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.26 | bwd_microstep: 5151.87 | bwd_inner_microstep: 5101.22 | bwd_allreduce_microstep: 50.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 06:47:23,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.32 | bwd_microstep: 5137.07 | bwd_inner_microstep: 5085.56 | bwd_allreduce_microstep: 51.44 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3793 [2024-07-31 06:47:31,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.35 | bwd_microstep: 4951.81 | bwd_inner_microstep: 4923.32 | bwd_allreduce_microstep: 28.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 06:47:40,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.52 | bwd_microstep: 5090.73 | bwd_inner_microstep: 5046.90 | bwd_allreduce_microstep: 43.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 06:47:48,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.58 | bwd_microstep: 4988.10 | bwd_inner_microstep: 4940.61 | bwd_allreduce_microstep: 47.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 06:47:57,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.01 | bwd_microstep: 5059.16 | bwd_inner_microstep: 4999.67 | bwd_allreduce_microstep: 59.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 06:48:06,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 06:48:06,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.13 | bwd_microstep: 5074.73 | bwd_inner_microstep: 5012.05 | bwd_allreduce_microstep: 62.60 | step_microstep: 181.59 [2024-07-31 06:48:06,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28379.73 | bwd: 40704.45 | bwd_inner: 40267.43 | bwd_allreduce: 436.53 | step: 182.17 19%|█▉ | 234/1230 [4:36:12<19:16:50, 69.69s/it] {'loss': 1.1565, 'learning_rate': 1.8684292106995916e-05, 'epoch': 0.19} 19%|█▉ | 234/1230 [4:36:12<19:16:50, 69.69s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4050 [2024-07-31 06:48:15,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3861.95 | bwd_microstep: 5315.11 | bwd_inner_microstep: 5296.02 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2330 [2024-07-31 06:48:24,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.46 | bwd_microstep: 5356.33 | bwd_inner_microstep: 4941.68 | bwd_allreduce_microstep: 414.58 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2252 [2024-07-31 06:48:33,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.53 | bwd_microstep: 5327.18 | bwd_inner_microstep: 4913.40 | bwd_allreduce_microstep: 413.72 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2337 [2024-07-31 06:48:42,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.14 | bwd_microstep: 5261.10 | bwd_inner_microstep: 4851.57 | bwd_allreduce_microstep: 409.46 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2094 [2024-07-31 06:48:51,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3466.43 | bwd_microstep: 5121.14 | bwd_inner_microstep: 4723.17 | bwd_allreduce_microstep: 397.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 06:48:59,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.37 | bwd_microstep: 5134.43 | bwd_inner_microstep: 5057.83 | bwd_allreduce_microstep: 76.53 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-31 06:49:08,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.83 | bwd_microstep: 4931.52 | bwd_inner_microstep: 4901.44 | bwd_allreduce_microstep: 30.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 06:49:17,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 06:49:17,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.17 | bwd_microstep: 4986.36 | bwd_inner_microstep: 4967.02 | bwd_allreduce_microstep: 19.27 | step_microstep: 181.75 [2024-07-31 06:49:17,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29093.75 | bwd: 41433.16 | bwd_inner: 39652.06 | bwd_allreduce: 1780.60 | step: 182.44 19%|█▉ | 235/1230 [4:37:23<19:21:29, 70.04s/it] {'loss': 1.1861, 'learning_rate': 1.8671205489584453e-05, 'epoch': 0.19} 19%|█▉ | 235/1230 [4:37:23<19:21:29, 70.04s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 06:49:26,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3859.32 | bwd_microstep: 5348.01 | bwd_inner_microstep: 5329.00 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 1727 [2024-07-31 06:49:35,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.08 | bwd_microstep: 5311.38 | bwd_inner_microstep: 4900.45 | bwd_allreduce_microstep: 410.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 06:49:43,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3238.93 | bwd_microstep: 4837.49 | bwd_inner_microstep: 4790.78 | bwd_allreduce_microstep: 46.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3799 [2024-07-31 06:49:52,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.29 | bwd_microstep: 5081.42 | bwd_inner_microstep: 5041.98 | bwd_allreduce_microstep: 39.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 06:50:01,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.22 | bwd_microstep: 5179.51 | bwd_inner_microstep: 5121.19 | bwd_allreduce_microstep: 58.26 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2180 [2024-07-31 06:50:09,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3449.89 | bwd_microstep: 5019.09 | bwd_inner_microstep: 4631.73 | bwd_allreduce_microstep: 387.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3665 [2024-07-31 06:50:18,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.44 | bwd_microstep: 5065.76 | bwd_inner_microstep: 4989.43 | bwd_allreduce_microstep: 76.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-31 06:50:27,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 06:50:27,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.84 | bwd_microstep: 4880.23 | bwd_inner_microstep: 4860.84 | bwd_allreduce_microstep: 19.32 | step_microstep: 181.27 [2024-07-31 06:50:27,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28562.93 | bwd: 40722.88 | bwd_inner: 39665.33 | bwd_allreduce: 1057.06 | step: 181.85 19%|█▉ | 236/1230 [4:38:32<19:18:11, 69.91s/it] {'loss': 1.245, 'learning_rate': 1.865805874122021e-05, 'epoch': 0.19} 19%|█▉ | 236/1230 [4:38:32<19:18:11, 69.91s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3846 [2024-07-31 06:50:35,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.24 | bwd_microstep: 5231.31 | bwd_inner_microstep: 5184.88 | bwd_allreduce_microstep: 46.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3795 [2024-07-31 06:50:44,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.59 | bwd_microstep: 5288.92 | bwd_inner_microstep: 5219.45 | bwd_allreduce_microstep: 69.41 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3693 [2024-07-31 06:50:53,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.52 | bwd_microstep: 5143.23 | bwd_inner_microstep: 5062.10 | bwd_allreduce_microstep: 81.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-07-31 06:51:02,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.87 | bwd_microstep: 5098.48 | bwd_inner_microstep: 4702.04 | bwd_allreduce_microstep: 396.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 06:51:11,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.73 | bwd_microstep: 4999.86 | bwd_inner_microstep: 4980.49 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 06:51:19,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.62 | bwd_microstep: 5192.91 | bwd_inner_microstep: 5136.23 | bwd_allreduce_microstep: 56.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 06:51:28,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3480.30 | bwd_microstep: 5061.56 | bwd_inner_microstep: 4669.29 | bwd_allreduce_microstep: 392.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 06:51:37,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 06:51:37,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.62 | bwd_microstep: 5067.21 | bwd_inner_microstep: 5023.13 | bwd_allreduce_microstep: 44.01 | step_microstep: 181.86 [2024-07-31 06:51:37,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29002.41 | bwd: 41083.45 | bwd_inner: 39977.55 | bwd_allreduce: 1105.41 | step: 182.43 19%|█▉ | 237/1230 [4:39:43<19:19:32, 70.06s/it] {'loss': 1.1938, 'learning_rate': 1.8644851953070045e-05, 'epoch': 0.19} 19%|█▉ | 237/1230 [4:39:43<19:19:32, 70.06s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2404 [2024-07-31 06:51:46,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.92 | bwd_microstep: 5507.83 | bwd_inner_microstep: 5084.92 | bwd_allreduce_microstep: 422.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3565 [2024-07-31 06:51:55,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3388.40 | bwd_microstep: 5280.21 | bwd_inner_microstep: 5182.70 | bwd_allreduce_microstep: 97.44 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3874 [2024-07-31 06:52:04,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.64 | bwd_microstep: 5190.33 | bwd_inner_microstep: 5141.17 | bwd_allreduce_microstep: 49.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-31 06:52:12,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.51 | bwd_microstep: 5197.29 | bwd_inner_microstep: 4793.95 | bwd_allreduce_microstep: 403.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 06:52:21,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.99 | bwd_microstep: 5125.24 | bwd_inner_microstep: 5057.36 | bwd_allreduce_microstep: 67.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 06:52:29,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3245.78 | bwd_microstep: 4789.35 | bwd_inner_microstep: 4753.91 | bwd_allreduce_microstep: 35.37 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3709 [2024-07-31 06:52:38,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.79 | bwd_microstep: 4904.10 | bwd_inner_microstep: 4873.29 | bwd_allreduce_microstep: 30.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 06:52:47,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 06:52:47,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.89 | bwd_microstep: 5406.23 | bwd_inner_microstep: 5149.33 | bwd_allreduce_microstep: 256.83 | step_microstep: 181.94 [2024-07-31 06:52:47,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28261.83 | bwd: 41400.55 | bwd_inner: 40036.57 | bwd_allreduce: 1363.50 | step: 182.54 19%|█▉ | 238/1230 [4:40:53<19:18:02, 70.04s/it] {'loss': 1.1563, 'learning_rate': 1.863158521671716e-05, 'epoch': 0.19} 19%|█▉ | 238/1230 [4:40:53<19:18:02, 70.04s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 06:52:56,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.52 | bwd_microstep: 5331.05 | bwd_inner_microstep: 5257.66 | bwd_allreduce_microstep: 73.32 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-07-31 06:53:05,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.03 | bwd_microstep: 5222.59 | bwd_inner_microstep: 5136.12 | bwd_allreduce_microstep: 86.39 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1121 [2024-07-31 06:53:13,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3019.54 | bwd_microstep: 5079.04 | bwd_inner_microstep: 4690.86 | bwd_allreduce_microstep: 388.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 06:53:21,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.01 | bwd_microstep: 4954.19 | bwd_inner_microstep: 4923.55 | bwd_allreduce_microstep: 30.57 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3736 [2024-07-31 06:53:30,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.16 | bwd_microstep: 4979.53 | bwd_inner_microstep: 4947.46 | bwd_allreduce_microstep: 32.00 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3675 [2024-07-31 06:53:39,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.11 | bwd_microstep: 5008.69 | bwd_inner_microstep: 4938.23 | bwd_allreduce_microstep: 70.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 06:53:47,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.37 | bwd_microstep: 5058.20 | bwd_inner_microstep: 5002.55 | bwd_allreduce_microstep: 55.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 06:53:56,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 06:53:56,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.80 | bwd_microstep: 4990.81 | bwd_inner_microstep: 4971.52 | bwd_allreduce_microstep: 19.22 | step_microstep: 183.11 [2024-07-31 06:53:56,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28306.44 | bwd: 40624.08 | bwd_inner: 39867.88 | bwd_allreduce: 755.70 | step: 183.82 19%|█▉ | 239/1230 [4:42:02<19:13:01, 69.81s/it] {'loss': 1.2807, 'learning_rate': 1.8618258624160465e-05, 'epoch': 0.19} 19%|█▉ | 239/1230 [4:42:02<19:13:01, 69.81s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3553 [2024-07-31 06:54:05,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.41 | bwd_microstep: 5216.42 | bwd_inner_microstep: 5130.11 | bwd_allreduce_microstep: 86.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-07-31 06:54:14,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.33 | bwd_microstep: 5462.31 | bwd_inner_microstep: 5371.06 | bwd_allreduce_microstep: 91.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2322 [2024-07-31 06:54:23,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.89 | bwd_microstep: 5372.08 | bwd_inner_microstep: 4955.89 | bwd_allreduce_microstep: 416.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2258 [2024-07-31 06:54:31,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3044.94 | bwd_microstep: 5032.63 | bwd_inner_microstep: 4644.07 | bwd_allreduce_microstep: 388.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2335 [2024-07-31 06:54:39,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3020.86 | bwd_microstep: 4874.28 | bwd_inner_microstep: 4498.55 | bwd_allreduce_microstep: 375.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-31 06:54:48,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.25 | bwd_microstep: 5029.92 | bwd_inner_microstep: 5005.03 | bwd_allreduce_microstep: 24.82 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3696 [2024-07-31 06:54:57,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.34 | bwd_microstep: 5133.65 | bwd_inner_microstep: 5050.64 | bwd_allreduce_microstep: 82.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-31 06:55:05,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-31 06:55:05,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3243.49 | bwd_microstep: 4753.37 | bwd_inner_microstep: 4724.10 | bwd_allreduce_microstep: 29.20 | step_microstep: 181.91 [2024-07-31 06:55:05,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27531.42 | bwd: 40874.63 | bwd_inner: 39379.39 | bwd_allreduce: 1494.76 | step: 182.50 20%|█▉ | 240/1230 [4:43:11<19:06:32, 69.49s/it] {'loss': 1.2266, 'learning_rate': 1.8604872267813954e-05, 'epoch': 0.2} 20%|█▉ | 240/1230 [4:43:11<19:06:32, 69.49s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3942 [2024-07-31 06:55:14,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.00 | bwd_microstep: 5400.55 | bwd_inner_microstep: 5336.63 | bwd_allreduce_microstep: 63.85 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2035 [2024-07-31 06:55:23,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.18 | bwd_microstep: 5216.49 | bwd_inner_microstep: 4812.40 | bwd_allreduce_microstep: 404.02 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3607 [2024-07-31 06:55:31,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3133.83 | bwd_microstep: 5049.64 | bwd_inner_microstep: 4979.38 | bwd_allreduce_microstep: 70.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 06:55:40,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.62 | bwd_microstep: 5118.22 | bwd_inner_microstep: 5046.82 | bwd_allreduce_microstep: 71.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 06:55:48,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.48 | bwd_microstep: 4978.84 | bwd_inner_microstep: 4959.44 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 06:55:57,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.42 | bwd_microstep: 5061.67 | bwd_inner_microstep: 5034.35 | bwd_allreduce_microstep: 27.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 06:56:05,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.79 | bwd_microstep: 4724.30 | bwd_inner_microstep: 4699.05 | bwd_allreduce_microstep: 25.18 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-31 06:56:14,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 06:56:14,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.47 | bwd_microstep: 5054.55 | bwd_inner_microstep: 4662.44 | bwd_allreduce_microstep: 392.04 | step_microstep: 182.92 [2024-07-31 06:56:14,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28123.69 | bwd: 40604.23 | bwd_inner: 39530.46 | bwd_allreduce: 1073.28 | step: 183.51 20%|█▉ | 241/1230 [4:44:20<19:03:16, 69.36s/it] {'loss': 1.2749, 'learning_rate': 1.859142624050605e-05, 'epoch': 0.2} 20%|█▉ | 241/1230 [4:44:20<19:03:16, 69.36s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3950 [2024-07-31 06:56:23,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.80 | bwd_microstep: 5290.93 | bwd_inner_microstep: 5239.99 | bwd_allreduce_microstep: 50.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3872 [2024-07-31 06:56:32,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.74 | bwd_microstep: 5083.84 | bwd_inner_microstep: 5052.86 | bwd_allreduce_microstep: 30.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3805 [2024-07-31 06:56:40,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.96 | bwd_microstep: 5117.12 | bwd_inner_microstep: 5074.54 | bwd_allreduce_microstep: 42.52 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3613 [2024-07-31 06:56:49,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.87 | bwd_microstep: 5154.47 | bwd_inner_microstep: 5061.02 | bwd_allreduce_microstep: 93.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 06:56:58,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.11 | bwd_microstep: 4999.91 | bwd_inner_microstep: 4961.49 | bwd_allreduce_microstep: 38.35 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 06:57:07,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.76 | bwd_microstep: 4953.02 | bwd_inner_microstep: 4919.42 | bwd_allreduce_microstep: 33.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 06:57:15,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.68 | bwd_microstep: 5223.80 | bwd_inner_microstep: 4818.21 | bwd_allreduce_microstep: 405.53 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 06:57:24,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 06:57:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.52 | bwd_microstep: 5058.58 | bwd_inner_microstep: 4998.35 | bwd_allreduce_microstep: 60.16 | step_microstep: 181.54 [2024-07-31 06:57:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29070.34 | bwd: 40881.65 | bwd_inner: 40125.81 | bwd_allreduce: 755.36 | step: 182.24 20%|█▉ | 242/1230 [4:45:30<19:06:41, 69.64s/it] {'loss': 1.2349, 'learning_rate': 1.8577920635478976e-05, 'epoch': 0.2} 20%|█▉ | 242/1230 [4:45:30<19:06:41, 69.64s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2963 [2024-07-31 06:57:33,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.63 | bwd_microstep: 5433.57 | bwd_inner_microstep: 5015.66 | bwd_allreduce_microstep: 417.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-07-31 06:57:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.74 | bwd_microstep: 5311.93 | bwd_inner_microstep: 5242.87 | bwd_allreduce_microstep: 68.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 06:57:50,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.61 | bwd_microstep: 4818.47 | bwd_inner_microstep: 4796.46 | bwd_allreduce_microstep: 21.94 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-07-31 06:57:59,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.75 | bwd_microstep: 5002.13 | bwd_inner_microstep: 4982.78 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-31 06:58:07,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3255.02 | bwd_microstep: 4865.74 | bwd_inner_microstep: 4838.72 | bwd_allreduce_microstep: 26.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-31 06:58:15,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.78 | bwd_microstep: 4863.18 | bwd_inner_microstep: 4811.71 | bwd_allreduce_microstep: 51.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 06:58:24,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.42 | bwd_microstep: 5120.10 | bwd_inner_microstep: 5051.13 | bwd_allreduce_microstep: 68.89 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3683 [2024-07-31 06:58:33,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 06:58:33,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.07 | bwd_microstep: 5042.33 | bwd_inner_microstep: 4970.56 | bwd_allreduce_microstep: 71.71 | step_microstep: 182.23 [2024-07-31 06:58:33,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27948.92 | bwd: 40457.43 | bwd_inner: 39709.83 | bwd_allreduce: 747.11 | step: 182.83 20%|█▉ | 243/1230 [4:46:39<19:01:05, 69.37s/it] {'loss': 1.2175, 'learning_rate': 1.8564355546388094e-05, 'epoch': 0.2} 20%|█▉ | 243/1230 [4:46:39<19:01:05, 69.37s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 06:58:42,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3881.41 | bwd_microstep: 5369.16 | bwd_inner_microstep: 5350.08 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 06:58:51,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.32 | bwd_microstep: 5109.91 | bwd_inner_microstep: 5067.76 | bwd_allreduce_microstep: 42.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3824 [2024-07-31 06:58:59,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3128.21 | bwd_microstep: 4940.43 | bwd_inner_microstep: 4901.60 | bwd_allreduce_microstep: 38.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 06:59:08,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.10 | bwd_microstep: 5157.06 | bwd_inner_microstep: 5077.79 | bwd_allreduce_microstep: 79.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 06:59:17,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.46 | bwd_microstep: 5139.38 | bwd_inner_microstep: 5061.69 | bwd_allreduce_microstep: 77.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-31 06:59:25,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.07 | bwd_microstep: 4914.14 | bwd_inner_microstep: 4894.35 | bwd_allreduce_microstep: 19.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 06:59:34,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.10 | bwd_microstep: 4988.07 | bwd_inner_microstep: 4935.45 | bwd_allreduce_microstep: 52.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3713 [2024-07-31 06:59:42,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 06:59:42,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3126.81 | bwd_microstep: 5024.18 | bwd_inner_microstep: 4972.46 | bwd_allreduce_microstep: 51.65 | step_microstep: 181.79 [2024-07-31 06:59:42,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28199.39 | bwd: 40642.31 | bwd_inner: 40261.12 | bwd_allreduce: 380.71 | step: 182.36 20%|█▉ | 244/1230 [4:47:48<18:58:58, 69.31s/it] {'loss': 1.2175, 'learning_rate': 1.855073106730126e-05, 'epoch': 0.2} 20%|█▉ | 244/1230 [4:47:48<18:58:58, 69.31s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3527 [2024-07-31 06:59:51,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.57 | bwd_microstep: 5281.16 | bwd_inner_microstep: 5179.27 | bwd_allreduce_microstep: 101.82 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 07:00:00,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.85 | bwd_microstep: 5186.10 | bwd_inner_microstep: 5106.12 | bwd_allreduce_microstep: 79.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 07:00:09,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.21 | bwd_microstep: 5174.49 | bwd_inner_microstep: 5119.29 | bwd_allreduce_microstep: 55.13 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3856 [2024-07-31 07:00:18,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.24 | bwd_microstep: 5155.36 | bwd_inner_microstep: 5125.37 | bwd_allreduce_microstep: 29.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-31 07:00:26,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.91 | bwd_microstep: 5055.66 | bwd_inner_microstep: 5033.28 | bwd_allreduce_microstep: 22.31 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 07:00:35,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.01 | bwd_microstep: 5256.16 | bwd_inner_microstep: 4849.15 | bwd_allreduce_microstep: 406.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3706 [2024-07-31 07:00:44,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.13 | bwd_microstep: 4997.46 | bwd_inner_microstep: 4933.01 | bwd_allreduce_microstep: 64.38 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 07:00:53,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 07:00:53,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.88 | bwd_microstep: 5071.68 | bwd_inner_microstep: 5009.29 | bwd_allreduce_microstep: 62.32 | step_microstep: 182.52 [2024-07-31 07:00:53,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28959.69 | bwd: 41178.05 | bwd_inner: 40354.72 | bwd_allreduce: 822.82 | step: 183.12 20%|█▉ | 245/1230 [4:48:59<19:03:33, 69.66s/it] {'loss': 1.1908, 'learning_rate': 1.8537047292698175e-05, 'epoch': 0.2} 20%|█▉ | 245/1230 [4:48:59<19:03:33, 69.66s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 07:01:02,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3861.95 | bwd_microstep: 5414.28 | bwd_inner_microstep: 5395.20 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 07:01:11,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.65 | bwd_microstep: 5162.41 | bwd_inner_microstep: 4759.51 | bwd_allreduce_microstep: 402.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-31 07:01:19,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.60 | bwd_microstep: 5147.59 | bwd_inner_microstep: 5072.59 | bwd_allreduce_microstep: 74.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 07:01:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.09 | bwd_microstep: 4947.56 | bwd_inner_microstep: 4918.30 | bwd_allreduce_microstep: 29.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 07:01:37,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.46 | bwd_microstep: 5170.14 | bwd_inner_microstep: 5095.36 | bwd_allreduce_microstep: 74.71 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 07:01:45,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.98 | bwd_microstep: 5009.98 | bwd_inner_microstep: 4952.67 | bwd_allreduce_microstep: 57.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 07:01:54,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3463.10 | bwd_microstep: 5049.22 | bwd_inner_microstep: 4657.29 | bwd_allreduce_microstep: 391.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-31 07:02:03,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 07:02:03,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.07 | bwd_microstep: 5339.71 | bwd_inner_microstep: 5161.31 | bwd_allreduce_microstep: 178.32 | step_microstep: 181.16 [2024-07-31 07:02:03,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28895.81 | bwd: 41240.87 | bwd_inner: 40012.18 | bwd_allreduce: 1228.19 | step: 181.85 20%|██ | 246/1230 [4:50:09<19:06:23, 69.90s/it] {'loss': 1.2191, 'learning_rate': 1.852330431746973e-05, 'epoch': 0.2} 20%|██ | 246/1230 [4:50:09<19:06:23, 69.90s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2381 [2024-07-31 07:02:12,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.03 | bwd_microstep: 5554.60 | bwd_inner_microstep: 5129.71 | bwd_allreduce_microstep: 424.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-07-31 07:02:21,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.00 | bwd_microstep: 5162.57 | bwd_inner_microstep: 5082.33 | bwd_allreduce_microstep: 80.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3919 [2024-07-31 07:02:30,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3450.00 | bwd_microstep: 5034.07 | bwd_inner_microstep: 5008.40 | bwd_allreduce_microstep: 25.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3938 [2024-07-31 07:02:39,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.77 | bwd_microstep: 5258.77 | bwd_inner_microstep: 5213.26 | bwd_allreduce_microstep: 45.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-31 07:02:47,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.08 | bwd_microstep: 5178.60 | bwd_inner_microstep: 5123.28 | bwd_allreduce_microstep: 55.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 07:02:55,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3241.88 | bwd_microstep: 4863.82 | bwd_inner_microstep: 4824.24 | bwd_allreduce_microstep: 39.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 07:03:04,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.21 | bwd_microstep: 5156.15 | bwd_inner_microstep: 5086.33 | bwd_allreduce_microstep: 69.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 07:03:12,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 07:03:12,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3194.58 | bwd_microstep: 4731.09 | bwd_inner_microstep: 4704.87 | bwd_allreduce_microstep: 26.15 | step_microstep: 181.49 [2024-07-31 07:03:12,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27981.46 | bwd: 40939.66 | bwd_inner: 40172.35 | bwd_allreduce: 766.83 | step: 182.07 20%|██ | 247/1230 [4:51:18<19:02:01, 69.71s/it] {'loss': 1.2083, 'learning_rate': 1.8509502236917353e-05, 'epoch': 0.2} 20%|██ | 247/1230 [4:51:18<19:02:01, 69.71s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3830 [2024-07-31 07:03:22,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.94 | bwd_microstep: 5444.51 | bwd_inner_microstep: 5368.48 | bwd_allreduce_microstep: 75.96 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3822 [2024-07-31 07:03:31,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.30 | bwd_microstep: 5228.30 | bwd_inner_microstep: 5185.91 | bwd_allreduce_microstep: 42.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2238 [2024-07-31 07:03:39,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3089.94 | bwd_microstep: 5113.68 | bwd_inner_microstep: 4722.12 | bwd_allreduce_microstep: 391.50 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-31 07:03:48,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.87 | bwd_microstep: 5250.98 | bwd_inner_microstep: 4844.15 | bwd_allreduce_microstep: 406.75 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-31 07:03:56,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.65 | bwd_microstep: 5102.99 | bwd_inner_microstep: 5037.23 | bwd_allreduce_microstep: 65.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3776 [2024-07-31 07:04:05,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.23 | bwd_microstep: 5005.05 | bwd_inner_microstep: 4985.71 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 07:04:14,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.97 | bwd_microstep: 5022.79 | bwd_inner_microstep: 4969.71 | bwd_allreduce_microstep: 53.01 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2110 [2024-07-31 07:04:22,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 07:04:22,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3017.25 | bwd_microstep: 4893.03 | bwd_inner_microstep: 4516.85 | bwd_allreduce_microstep: 376.11 | step_microstep: 182.70 [2024-07-31 07:04:22,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27962.06 | bwd: 41061.32 | bwd_inner: 39630.10 | bwd_allreduce: 1430.72 | step: 183.31 20%|██ | 248/1230 [4:52:28<18:59:08, 69.60s/it] {'loss': 1.2827, 'learning_rate': 1.8495641146752322e-05, 'epoch': 0.2} 20%|██ | 248/1230 [4:52:28<18:59:08, 69.60s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2433 [2024-07-31 07:04:31,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.42 | bwd_microstep: 5538.57 | bwd_inner_microstep: 5113.23 | bwd_allreduce_microstep: 425.27 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2248 [2024-07-31 07:04:40,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.82 | bwd_microstep: 5186.82 | bwd_inner_microstep: 4783.33 | bwd_allreduce_microstep: 403.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-31 07:04:48,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.83 | bwd_microstep: 5143.93 | bwd_inner_microstep: 5092.12 | bwd_allreduce_microstep: 51.74 | step_microstep: 0.18 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3746 [2024-07-31 07:04:57,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.44 | bwd_microstep: 5122.90 | bwd_inner_microstep: 5067.39 | bwd_allreduce_microstep: 55.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3656 [2024-07-31 07:05:06,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.20 | bwd_microstep: 5027.15 | bwd_inner_microstep: 4952.96 | bwd_allreduce_microstep: 74.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-31 07:05:14,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3209.78 | bwd_microstep: 4726.79 | bwd_inner_microstep: 4699.83 | bwd_allreduce_microstep: 26.89 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2677 [2024-07-31 07:05:22,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.55 | bwd_microstep: 5208.40 | bwd_inner_microstep: 4802.80 | bwd_allreduce_microstep: 405.53 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 07:05:31,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 07:05:31,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.33 | bwd_microstep: 5026.81 | bwd_inner_microstep: 4970.59 | bwd_allreduce_microstep: 56.14 | step_microstep: 181.28 [2024-07-31 07:05:31,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28233.27 | bwd: 40981.34 | bwd_inner: 39482.19 | bwd_allreduce: 1498.65 | step: 181.99 20%|██ | 249/1230 [4:53:37<18:57:42, 69.59s/it] {'loss': 1.2277, 'learning_rate': 1.848172114309513e-05, 'epoch': 0.2} 20%|██ | 249/1230 [4:53:37<18:57:42, 69.59s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3913 [2024-07-31 07:05:41,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.40 | bwd_microstep: 5610.34 | bwd_inner_microstep: 5511.60 | bwd_allreduce_microstep: 98.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4023 [2024-07-31 07:05:49,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.88 | bwd_microstep: 5068.11 | bwd_inner_microstep: 5048.85 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3893 [2024-07-31 07:05:58,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.81 | bwd_microstep: 5150.63 | bwd_inner_microstep: 5111.15 | bwd_allreduce_microstep: 39.41 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1179 [2024-07-31 07:06:07,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.19 | bwd_microstep: 5258.92 | bwd_inner_microstep: 4854.66 | bwd_allreduce_microstep: 404.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-31 07:06:16,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.86 | bwd_microstep: 5137.89 | bwd_inner_microstep: 4738.69 | bwd_allreduce_microstep: 399.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 07:06:24,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.85 | bwd_microstep: 5012.22 | bwd_inner_microstep: 4985.69 | bwd_allreduce_microstep: 26.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 07:06:33,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.59 | bwd_microstep: 5057.35 | bwd_inner_microstep: 4998.19 | bwd_allreduce_microstep: 59.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 07:06:42,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 07:06:42,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.72 | bwd_microstep: 5010.28 | bwd_inner_microstep: 4958.88 | bwd_allreduce_microstep: 51.34 | step_microstep: 181.51 [2024-07-31 07:06:42,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28891.19 | bwd: 41305.72 | bwd_inner: 40207.64 | bwd_allreduce: 1097.59 | step: 182.09 20%|██ | 250/1230 [4:54:48<19:01:11, 69.87s/it] {'loss': 1.2175, 'learning_rate': 1.8467742322474822e-05, 'epoch': 0.2} 20%|██ | 250/1230 [4:54:48<19:01:11, 69.87s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2370 [2024-07-31 07:06:51,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.69 | bwd_microstep: 5396.66 | bwd_inner_microstep: 4984.91 | bwd_allreduce_microstep: 411.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4024 [2024-07-31 07:07:00,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.15 | bwd_microstep: 5158.70 | bwd_inner_microstep: 5130.17 | bwd_allreduce_microstep: 28.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2236 [2024-07-31 07:07:09,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.31 | bwd_microstep: 5224.15 | bwd_inner_microstep: 4818.60 | bwd_allreduce_microstep: 405.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-31 07:07:16,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3187.30 | bwd_microstep: 4764.09 | bwd_inner_microstep: 4728.68 | bwd_allreduce_microstep: 35.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 07:07:25,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.05 | bwd_microstep: 5184.63 | bwd_inner_microstep: 5131.91 | bwd_allreduce_microstep: 52.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3637 [2024-07-31 07:07:34,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.76 | bwd_microstep: 4910.07 | bwd_inner_microstep: 4880.78 | bwd_allreduce_microstep: 29.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 07:07:43,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.28 | bwd_microstep: 4895.38 | bwd_inner_microstep: 4875.96 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-31 07:07:51,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 07:07:51,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.84 | bwd_microstep: 5013.72 | bwd_inner_microstep: 4956.53 | bwd_allreduce_microstep: 57.12 | step_microstep: 181.60 [2024-07-31 07:07:51,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28649.28 | bwd: 40547.36 | bwd_inner: 39507.46 | bwd_allreduce: 1039.39 | step: 182.19 20%|██ | 251/1230 [4:55:57<18:58:22, 69.77s/it] {'loss': 1.2498, 'learning_rate': 1.845370478182829e-05, 'epoch': 0.2} 20%|██ | 251/1230 [4:55:57<18:58:22, 69.77s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2442 [2024-07-31 07:08:00,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.31 | bwd_microstep: 5297.11 | bwd_inner_microstep: 4888.55 | bwd_allreduce_microstep: 408.49 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4010 [2024-07-31 07:08:09,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.17 | bwd_microstep: 5167.24 | bwd_inner_microstep: 5147.84 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3854 [2024-07-31 07:08:18,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.51 | bwd_microstep: 5099.73 | bwd_inner_microstep: 5080.46 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3775 [2024-07-31 07:08:27,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.34 | bwd_microstep: 5117.44 | bwd_inner_microstep: 5055.53 | bwd_allreduce_microstep: 61.84 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 07:08:35,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.53 | bwd_microstep: 4939.74 | bwd_inner_microstep: 4915.26 | bwd_allreduce_microstep: 24.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 07:08:44,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.17 | bwd_microstep: 5237.71 | bwd_inner_microstep: 4831.42 | bwd_allreduce_microstep: 406.21 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-31 07:08:53,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.90 | bwd_microstep: 5167.97 | bwd_inner_microstep: 5097.55 | bwd_allreduce_microstep: 70.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2140 [2024-07-31 07:09:01,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 07:09:01,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3272.76 | bwd_microstep: 4931.67 | bwd_inner_microstep: 4550.64 | bwd_allreduce_microstep: 380.97 | step_microstep: 181.82 [2024-07-31 07:09:01,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28679.61 | bwd: 40958.58 | bwd_inner: 39567.19 | bwd_allreduce: 1390.90 | step: 182.43 20%|██ | 252/1230 [4:57:07<18:58:12, 69.83s/it] {'loss': 1.2211, 'learning_rate': 1.8439608618499637e-05, 'epoch': 0.2} 20%|██ | 252/1230 [4:57:07<18:58:12, 69.83s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4044 [2024-07-31 07:09:11,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3861.55 | bwd_microstep: 5322.86 | bwd_inner_microstep: 5303.71 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3567 [2024-07-31 07:09:19,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.61 | bwd_microstep: 5063.54 | bwd_inner_microstep: 4992.48 | bwd_allreduce_microstep: 70.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-31 07:09:28,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.65 | bwd_microstep: 5091.59 | bwd_inner_microstep: 5046.94 | bwd_allreduce_microstep: 44.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 07:09:37,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.96 | bwd_microstep: 5114.71 | bwd_inner_microstep: 5035.35 | bwd_allreduce_microstep: 79.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 07:09:45,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.76 | bwd_microstep: 5233.54 | bwd_inner_microstep: 4829.14 | bwd_allreduce_microstep: 404.33 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3711 [2024-07-31 07:09:54,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.30 | bwd_microstep: 5111.32 | bwd_inner_microstep: 5032.41 | bwd_allreduce_microstep: 78.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 07:10:03,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.92 | bwd_microstep: 5005.39 | bwd_inner_microstep: 4956.03 | bwd_allreduce_microstep: 49.29 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 07:10:12,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 07:10:12,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.07 | bwd_microstep: 5056.94 | bwd_inner_microstep: 5016.46 | bwd_allreduce_microstep: 40.41 | step_microstep: 181.49 [2024-07-31 07:10:12,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29002.71 | bwd: 40999.88 | bwd_inner: 40212.45 | bwd_allreduce: 786.93 | step: 182.07 21%|██ | 253/1230 [4:58:18<18:59:32, 69.98s/it] {'loss': 1.2003, 'learning_rate': 1.842545393023949e-05, 'epoch': 0.21} 21%|██ | 253/1230 [4:58:18<18:59:32, 69.98s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3938 [2024-07-31 07:10:20,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.86 | bwd_microstep: 5172.61 | bwd_inner_microstep: 5123.96 | bwd_allreduce_microstep: 48.58 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3823 [2024-07-31 07:10:29,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.73 | bwd_microstep: 5200.23 | bwd_inner_microstep: 5161.29 | bwd_allreduce_microstep: 38.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-07-31 07:10:38,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.14 | bwd_microstep: 5011.58 | bwd_inner_microstep: 4992.31 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-31 07:10:47,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.43 | bwd_microstep: 5169.45 | bwd_inner_microstep: 5112.73 | bwd_allreduce_microstep: 56.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 07:10:56,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.33 | bwd_microstep: 5101.11 | bwd_inner_microstep: 5029.55 | bwd_allreduce_microstep: 71.49 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1078 [2024-07-31 07:11:04,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.57 | bwd_microstep: 5134.15 | bwd_inner_microstep: 4735.75 | bwd_allreduce_microstep: 398.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-07-31 07:11:12,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3032.27 | bwd_microstep: 4918.19 | bwd_inner_microstep: 4538.40 | bwd_allreduce_microstep: 379.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 07:11:21,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 07:11:21,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.66 | bwd_microstep: 4878.31 | bwd_inner_microstep: 4859.08 | bwd_allreduce_microstep: 19.16 | step_microstep: 181.02 [2024-07-31 07:11:21,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28412.89 | bwd: 40585.61 | bwd_inner: 39553.01 | bwd_allreduce: 1032.10 | step: 181.61 21%|██ | 254/1230 [4:59:27<18:55:11, 69.79s/it] {'loss': 1.215, 'learning_rate': 1.841124081520431e-05, 'epoch': 0.21} 21%|██ | 254/1230 [4:59:27<18:55:11, 69.79s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3909 [2024-07-31 07:11:30,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.29 | bwd_microstep: 5444.28 | bwd_inner_microstep: 5373.66 | bwd_allreduce_microstep: 70.55 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3742 [2024-07-31 07:11:39,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.31 | bwd_microstep: 4921.87 | bwd_inner_microstep: 4902.57 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3823 [2024-07-31 07:11:47,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.32 | bwd_microstep: 5162.45 | bwd_inner_microstep: 5118.18 | bwd_allreduce_microstep: 44.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-31 07:11:56,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.95 | bwd_microstep: 5174.61 | bwd_inner_microstep: 5118.35 | bwd_allreduce_microstep: 56.18 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3713 [2024-07-31 07:12:05,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3345.83 | bwd_microstep: 4893.17 | bwd_inner_microstep: 4863.95 | bwd_allreduce_microstep: 29.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 07:12:13,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.46 | bwd_microstep: 4890.93 | bwd_inner_microstep: 4871.58 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 07:12:22,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.33 | bwd_microstep: 5053.72 | bwd_inner_microstep: 4992.61 | bwd_allreduce_microstep: 61.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 07:12:31,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 07:12:31,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.62 | bwd_microstep: 5097.94 | bwd_inner_microstep: 4700.13 | bwd_allreduce_microstep: 397.74 | step_microstep: 181.08 [2024-07-31 07:12:31,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28680.01 | bwd: 40638.93 | bwd_inner: 39940.98 | bwd_allreduce: 697.45 | step: 181.67 21%|██ | 255/1230 [5:00:37<18:53:22, 69.75s/it] {'loss': 1.1724, 'learning_rate': 1.8396969371955724e-05, 'epoch': 0.21} 21%|██ | 255/1230 [5:00:37<18:53:22, 69.75s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3991 [2024-07-31 07:12:40,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3822.47 | bwd_microstep: 5298.02 | bwd_inner_microstep: 5276.96 | bwd_allreduce_microstep: 20.99 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3792 [2024-07-31 07:12:49,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.05 | bwd_microstep: 5039.46 | bwd_inner_microstep: 5019.78 | bwd_allreduce_microstep: 19.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-31 07:12:57,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.24 | bwd_microstep: 5008.10 | bwd_inner_microstep: 4988.77 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 07:13:06,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.74 | bwd_microstep: 5041.28 | bwd_inner_microstep: 5018.69 | bwd_allreduce_microstep: 22.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 07:13:15,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.67 | bwd_microstep: 5051.78 | bwd_inner_microstep: 5025.63 | bwd_allreduce_microstep: 26.08 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 07:13:24,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.83 | bwd_microstep: 5017.23 | bwd_inner_microstep: 4979.34 | bwd_allreduce_microstep: 37.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 07:13:32,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.26 | bwd_microstep: 4892.69 | bwd_inner_microstep: 4873.32 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 07:13:41,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 07:13:41,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.27 | bwd_microstep: 5035.37 | bwd_inner_microstep: 4978.86 | bwd_allreduce_microstep: 56.44 | step_microstep: 182.06 [2024-07-31 07:13:41,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29812.44 | bwd: 40383.90 | bwd_inner: 40161.28 | bwd_allreduce: 222.11 | step: 182.76 21%|██ | 256/1230 [5:01:47<18:56:04, 69.98s/it] {'loss': 1.2061, 'learning_rate': 1.838263969945985e-05, 'epoch': 0.21} 21%|██ | 256/1230 [5:01:47<18:56:04, 69.98s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4044 [2024-07-31 07:13:50,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.55 | bwd_microstep: 5295.93 | bwd_inner_microstep: 5264.43 | bwd_allreduce_microstep: 31.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3567 [2024-07-31 07:13:59,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.48 | bwd_microstep: 5218.94 | bwd_inner_microstep: 5125.59 | bwd_allreduce_microstep: 93.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-31 07:14:07,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.24 | bwd_microstep: 4850.09 | bwd_inner_microstep: 4800.48 | bwd_allreduce_microstep: 49.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 07:14:16,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.98 | bwd_microstep: 4962.53 | bwd_inner_microstep: 4909.36 | bwd_allreduce_microstep: 53.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-31 07:14:24,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.00 | bwd_microstep: 4973.89 | bwd_inner_microstep: 4954.53 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3642 [2024-07-31 07:14:32,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3105.61 | bwd_microstep: 4864.71 | bwd_inner_microstep: 4817.88 | bwd_allreduce_microstep: 46.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3690 [2024-07-31 07:14:40,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3075.32 | bwd_microstep: 4829.02 | bwd_inner_microstep: 4787.52 | bwd_allreduce_microstep: 41.44 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3672 [2024-07-31 07:14:49,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 07:14:49,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.51 | bwd_microstep: 5054.36 | bwd_inner_microstep: 4988.17 | bwd_allreduce_microstep: 66.13 | step_microstep: 181.73 [2024-07-31 07:14:49,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27603.59 | bwd: 40049.46 | bwd_inner: 39647.89 | bwd_allreduce: 401.08 | step: 182.30 21%|██ | 257/1230 [5:02:55<18:45:11, 69.38s/it] {'loss': 1.2767, 'learning_rate': 1.836825189708659e-05, 'epoch': 0.21} 21%|██ | 257/1230 [5:02:55<18:45:11, 69.38s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4020 [2024-07-31 07:14:58,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3851.83 | bwd_microstep: 5263.16 | bwd_inner_microstep: 5243.96 | bwd_allreduce_microstep: 19.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-31 07:15:07,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.01 | bwd_microstep: 5228.73 | bwd_inner_microstep: 4820.65 | bwd_allreduce_microstep: 408.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3790 [2024-07-31 07:15:15,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3234.36 | bwd_microstep: 4830.82 | bwd_inner_microstep: 4811.49 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 07:15:24,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.74 | bwd_microstep: 5181.87 | bwd_inner_microstep: 5101.04 | bwd_allreduce_microstep: 80.76 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 07:15:33,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.11 | bwd_microstep: 5016.31 | bwd_inner_microstep: 4974.11 | bwd_allreduce_microstep: 42.13 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 07:15:42,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.58 | bwd_microstep: 5023.02 | bwd_inner_microstep: 4999.26 | bwd_allreduce_microstep: 23.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 07:15:51,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.96 | bwd_microstep: 5418.50 | bwd_inner_microstep: 5250.65 | bwd_allreduce_microstep: 167.77 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2118 [2024-07-31 07:15:59,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 07:15:59,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.57 | bwd_microstep: 5056.43 | bwd_inner_microstep: 4666.13 | bwd_allreduce_microstep: 390.23 | step_microstep: 182.05 [2024-07-31 07:15:59,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28811.06 | bwd: 41018.82 | bwd_inner: 39867.25 | bwd_allreduce: 1151.08 | step: 182.65 21%|██ | 258/1230 [5:04:05<18:47:49, 69.62s/it] {'loss': 1.2336, 'learning_rate': 1.8353806064608953e-05, 'epoch': 0.21} 21%|██ | 258/1230 [5:04:05<18:47:49, 69.62s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4096 [2024-07-31 07:16:09,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.96 | bwd_microstep: 5484.48 | bwd_inner_microstep: 5445.32 | bwd_allreduce_microstep: 39.09 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-31 07:16:17,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.46 | bwd_microstep: 5191.41 | bwd_inner_microstep: 5134.08 | bwd_allreduce_microstep: 57.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 07:16:26,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.59 | bwd_microstep: 5092.03 | bwd_inner_microstep: 5041.52 | bwd_allreduce_microstep: 50.44 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 07:16:35,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.49 | bwd_microstep: 5119.22 | bwd_inner_microstep: 5051.97 | bwd_allreduce_microstep: 67.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-31 07:16:44,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.09 | bwd_microstep: 5107.51 | bwd_inner_microstep: 5037.65 | bwd_allreduce_microstep: 69.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 07:16:52,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.19 | bwd_microstep: 4939.83 | bwd_inner_microstep: 4909.76 | bwd_allreduce_microstep: 30.00 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 07:17:00,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3348.60 | bwd_microstep: 4863.74 | bwd_inner_microstep: 4829.59 | bwd_allreduce_microstep: 34.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2124 [2024-07-31 07:17:09,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 07:17:09,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3000.92 | bwd_microstep: 4885.07 | bwd_inner_microstep: 4509.23 | bwd_allreduce_microstep: 375.77 | step_microstep: 182.16 [2024-07-31 07:17:09,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28241.21 | bwd: 40683.27 | bwd_inner: 39959.06 | bwd_allreduce: 723.72 | step: 182.86 21%|██ | 259/1230 [5:05:14<18:44:54, 69.51s/it] {'loss': 1.1731, 'learning_rate': 1.833930230220236e-05, 'epoch': 0.21} 21%|██ | 259/1230 [5:05:14<18:44:54, 69.51s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2341 [2024-07-31 07:17:18,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.15 | bwd_microstep: 5388.50 | bwd_inner_microstep: 4973.93 | bwd_allreduce_microstep: 414.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3837 [2024-07-31 07:17:27,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.90 | bwd_microstep: 5459.63 | bwd_inner_microstep: 5371.63 | bwd_allreduce_microstep: 87.93 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3097 [2024-07-31 07:17:35,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.75 | bwd_microstep: 5170.47 | bwd_inner_microstep: 4886.02 | bwd_allreduce_microstep: 284.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 07:17:44,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.14 | bwd_microstep: 5073.45 | bwd_inner_microstep: 5043.42 | bwd_allreduce_microstep: 29.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 07:17:53,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.80 | bwd_microstep: 4889.78 | bwd_inner_microstep: 4870.42 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 07:18:02,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.48 | bwd_microstep: 5092.08 | bwd_inner_microstep: 5025.13 | bwd_allreduce_microstep: 66.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 07:18:10,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.39 | bwd_microstep: 5184.07 | bwd_inner_microstep: 4781.21 | bwd_allreduce_microstep: 402.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 07:18:19,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 07:18:19,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3195.04 | bwd_microstep: 4734.16 | bwd_inner_microstep: 4709.41 | bwd_allreduce_microstep: 24.68 | step_microstep: 182.95 [2024-07-31 07:18:19,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28631.57 | bwd: 40992.12 | bwd_inner: 39661.11 | bwd_allreduce: 1330.52 | step: 183.53 21%|██ | 260/1230 [5:06:24<18:45:55, 69.64s/it] {'loss': 1.2047, 'learning_rate': 1.8324740710443955e-05, 'epoch': 0.21} 21%|██ | 260/1230 [5:06:24<18:45:55, 69.64s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3858 [2024-07-31 07:18:28,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3799.71 | bwd_microstep: 5151.94 | bwd_inner_microstep: 5132.84 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3787 [2024-07-31 07:18:37,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3671.96 | bwd_microstep: 5349.91 | bwd_inner_microstep: 5276.52 | bwd_allreduce_microstep: 73.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 07:18:45,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.78 | bwd_microstep: 5037.88 | bwd_inner_microstep: 5012.00 | bwd_allreduce_microstep: 25.81 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2882 [2024-07-31 07:18:54,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.93 | bwd_microstep: 5082.22 | bwd_inner_microstep: 4685.50 | bwd_allreduce_microstep: 396.64 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3753 [2024-07-31 07:19:03,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.92 | bwd_microstep: 5096.79 | bwd_inner_microstep: 5024.46 | bwd_allreduce_microstep: 72.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 07:19:11,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.94 | bwd_microstep: 5083.91 | bwd_inner_microstep: 5023.92 | bwd_allreduce_microstep: 59.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2153 [2024-07-31 07:19:20,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.23 | bwd_microstep: 5184.99 | bwd_inner_microstep: 4782.56 | bwd_allreduce_microstep: 402.37 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3674 [2024-07-31 07:19:29,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 07:19:29,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.51 | bwd_microstep: 5062.76 | bwd_inner_microstep: 4984.14 | bwd_allreduce_microstep: 78.55 | step_microstep: 181.51 [2024-07-31 07:19:29,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29072.87 | bwd: 41050.38 | bwd_inner: 39921.88 | bwd_allreduce: 1128.00 | step: 182.10 21%|██ | 261/1230 [5:07:35<18:48:42, 69.89s/it] {'loss': 1.2381, 'learning_rate': 1.831012139031189e-05, 'epoch': 0.21} 21%|██ | 261/1230 [5:07:35<18:48:42, 69.89s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4002 [2024-07-31 07:19:38,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.46 | bwd_microstep: 5467.81 | bwd_inner_microstep: 5409.85 | bwd_allreduce_microstep: 57.89 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-31 07:19:47,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.07 | bwd_microstep: 4967.08 | bwd_inner_microstep: 4947.74 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2229 [2024-07-31 07:19:55,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3305.48 | bwd_microstep: 5124.58 | bwd_inner_microstep: 4726.62 | bwd_allreduce_microstep: 397.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 07:20:04,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.93 | bwd_microstep: 5089.45 | bwd_inner_microstep: 5044.40 | bwd_allreduce_microstep: 44.98 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-31 07:20:12,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3016.39 | bwd_microstep: 4939.99 | bwd_inner_microstep: 4558.20 | bwd_allreduce_microstep: 381.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3673 [2024-07-31 07:20:21,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.78 | bwd_microstep: 4918.79 | bwd_inner_microstep: 4891.76 | bwd_allreduce_microstep: 26.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 07:20:29,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.74 | bwd_microstep: 5023.07 | bwd_inner_microstep: 4972.39 | bwd_allreduce_microstep: 50.61 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1603 [2024-07-31 07:20:38,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 07:20:38,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3460.87 | bwd_microstep: 5073.49 | bwd_inner_microstep: 4682.50 | bwd_allreduce_microstep: 390.91 | step_microstep: 181.48 [2024-07-31 07:20:38,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28075.64 | bwd: 40604.23 | bwd_inner: 39233.41 | bwd_allreduce: 1370.33 | step: 182.18 21%|██▏ | 262/1230 [5:08:44<18:43:17, 69.63s/it] {'loss': 1.1746, 'learning_rate': 1.829544444318466e-05, 'epoch': 0.21} 21%|██▏ | 262/1230 [5:08:44<18:43:17, 69.63s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 4096 [2024-07-31 07:20:47,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.51 | bwd_microstep: 5392.02 | bwd_inner_microstep: 5334.70 | bwd_allreduce_microstep: 57.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3846 [2024-07-31 07:20:56,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.74 | bwd_microstep: 5093.89 | bwd_inner_microstep: 5074.52 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3606 [2024-07-31 07:21:05,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.12 | bwd_microstep: 5182.18 | bwd_inner_microstep: 5104.47 | bwd_allreduce_microstep: 77.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3700 [2024-07-31 07:21:14,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.58 | bwd_microstep: 5103.85 | bwd_inner_microstep: 5023.71 | bwd_allreduce_microstep: 80.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3686 [2024-07-31 07:21:22,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.19 | bwd_microstep: 5138.22 | bwd_inner_microstep: 5052.45 | bwd_allreduce_microstep: 85.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-31 07:21:31,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.14 | bwd_microstep: 4995.29 | bwd_inner_microstep: 4957.48 | bwd_allreduce_microstep: 37.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2135 [2024-07-31 07:21:40,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.31 | bwd_microstep: 5111.32 | bwd_inner_microstep: 4715.03 | bwd_allreduce_microstep: 396.23 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 07:21:48,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 07:21:48,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.31 | bwd_microstep: 5057.90 | bwd_inner_microstep: 4995.03 | bwd_allreduce_microstep: 62.80 | step_microstep: 181.39 [2024-07-31 07:21:48,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28941.80 | bwd: 41074.65 | bwd_inner: 40257.33 | bwd_allreduce: 816.83 | step: 181.98 21%|██▏ | 263/1230 [5:09:54<18:45:37, 69.84s/it] {'loss': 1.2142, 'learning_rate': 1.8280709970840352e-05, 'epoch': 0.21} 21%|██▏ | 263/1230 [5:09:54<18:45:37, 69.84s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3917 [2024-07-31 07:21:58,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3856.57 | bwd_microstep: 5318.14 | bwd_inner_microstep: 5276.98 | bwd_allreduce_microstep: 41.09 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3576 [2024-07-31 07:22:06,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.05 | bwd_microstep: 5161.73 | bwd_inner_microstep: 5075.66 | bwd_allreduce_microstep: 86.00 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 07:22:15,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.01 | bwd_microstep: 5156.00 | bwd_inner_microstep: 5079.69 | bwd_allreduce_microstep: 76.24 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-31 07:22:24,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.96 | bwd_microstep: 5132.87 | bwd_inner_microstep: 5057.21 | bwd_allreduce_microstep: 75.59 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3728 [2024-07-31 07:22:33,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.03 | bwd_microstep: 5145.95 | bwd_inner_microstep: 5076.58 | bwd_allreduce_microstep: 69.29 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-31 07:22:42,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.24 | bwd_microstep: 5380.87 | bwd_inner_microstep: 5199.25 | bwd_allreduce_microstep: 181.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-31 07:22:50,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.24 | bwd_microstep: 4883.97 | bwd_inner_microstep: 4864.61 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 07:22:59,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 07:22:59,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.53 | bwd_microstep: 4921.19 | bwd_inner_microstep: 4899.20 | bwd_allreduce_microstep: 21.92 | step_microstep: 181.89 [2024-07-31 07:22:59,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29385.53 | bwd: 41100.69 | bwd_inner: 40529.11 | bwd_allreduce: 571.08 | step: 182.60 21%|██▏ | 264/1230 [5:11:05<18:49:11, 70.14s/it] {'loss': 1.2034, 'learning_rate': 1.8265918075455985e-05, 'epoch': 0.21} 21%|██▏ | 264/1230 [5:11:05<18:49:11, 70.14s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4090 [2024-07-31 07:23:08,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.05 | bwd_microstep: 5205.93 | bwd_inner_microstep: 5186.88 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3914 [2024-07-31 07:23:17,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3803.50 | bwd_microstep: 5213.38 | bwd_inner_microstep: 5187.34 | bwd_allreduce_microstep: 25.96 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2085 [2024-07-31 07:23:25,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3084.14 | bwd_microstep: 5121.57 | bwd_inner_microstep: 4729.64 | bwd_allreduce_microstep: 391.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-31 07:23:34,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.31 | bwd_microstep: 5170.70 | bwd_inner_microstep: 4768.56 | bwd_allreduce_microstep: 402.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 07:23:43,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.95 | bwd_microstep: 5220.68 | bwd_inner_microstep: 4815.39 | bwd_allreduce_microstep: 405.22 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3611 [2024-07-31 07:23:51,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.06 | bwd_microstep: 5008.78 | bwd_inner_microstep: 4933.46 | bwd_allreduce_microstep: 75.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-31 07:24:00,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.91 | bwd_microstep: 4984.51 | bwd_inner_microstep: 4965.15 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 07:24:09,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 07:24:09,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.78 | bwd_microstep: 4911.80 | bwd_inner_microstep: 4887.29 | bwd_allreduce_microstep: 24.43 | step_microstep: 181.95 [2024-07-31 07:24:09,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28670.61 | bwd: 40837.31 | bwd_inner: 39473.65 | bwd_allreduce: 1363.16 | step: 182.53 22%|██▏ | 265/1230 [5:12:15<18:46:35, 70.05s/it] {'loss': 1.1801, 'learning_rate': 1.8251068859606777e-05, 'epoch': 0.22} 22%|██▏ | 265/1230 [5:12:15<18:46:35, 70.05s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4082 [2024-07-31 07:24:18,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.16 | bwd_microstep: 5543.73 | bwd_inner_microstep: 5487.10 | bwd_allreduce_microstep: 56.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3557 [2024-07-31 07:24:27,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.57 | bwd_microstep: 5215.83 | bwd_inner_microstep: 5123.38 | bwd_allreduce_microstep: 92.38 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2864 [2024-07-31 07:24:36,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.37 | bwd_microstep: 5161.38 | bwd_inner_microstep: 4760.73 | bwd_allreduce_microstep: 400.58 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3767 [2024-07-31 07:24:45,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.08 | bwd_microstep: 5139.95 | bwd_inner_microstep: 5102.48 | bwd_allreduce_microstep: 37.41 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 07:24:54,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.88 | bwd_microstep: 5196.11 | bwd_inner_microstep: 5113.68 | bwd_allreduce_microstep: 82.36 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3652 [2024-07-31 07:25:02,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.44 | bwd_microstep: 5104.08 | bwd_inner_microstep: 5020.71 | bwd_allreduce_microstep: 83.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 07:25:11,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.86 | bwd_microstep: 4996.91 | bwd_inner_microstep: 4938.61 | bwd_allreduce_microstep: 58.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 07:25:20,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 07:25:20,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.41 | bwd_microstep: 5100.13 | bwd_inner_microstep: 4705.51 | bwd_allreduce_microstep: 394.54 | step_microstep: 183.59 [2024-07-31 07:25:20,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28767.68 | bwd: 41458.08 | bwd_inner: 40252.14 | bwd_allreduce: 1205.44 | step: 184.19 22%|██▏ | 266/1230 [5:13:25<18:47:54, 70.20s/it] {'loss': 1.2023, 'learning_rate': 1.8236162426265424e-05, 'epoch': 0.22} 22%|██▏ | 266/1230 [5:13:25<18:47:54, 70.20s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4016 [2024-07-31 07:25:28,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3330.73 | bwd_microstep: 5080.22 | bwd_inner_microstep: 5061.00 | bwd_allreduce_microstep: 19.15 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2242 [2024-07-31 07:25:37,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.30 | bwd_microstep: 5310.50 | bwd_inner_microstep: 4898.14 | bwd_allreduce_microstep: 412.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 07:25:46,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.37 | bwd_microstep: 5231.16 | bwd_inner_microstep: 5169.01 | bwd_allreduce_microstep: 62.08 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 07:25:55,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.00 | bwd_microstep: 5181.08 | bwd_inner_microstep: 5100.05 | bwd_allreduce_microstep: 80.97 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2093 [2024-07-31 07:26:03,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.61 | bwd_microstep: 5177.94 | bwd_inner_microstep: 4775.67 | bwd_allreduce_microstep: 402.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 07:26:12,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.23 | bwd_microstep: 5198.67 | bwd_inner_microstep: 4799.78 | bwd_allreduce_microstep: 398.83 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 07:26:21,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.17 | bwd_microstep: 5187.34 | bwd_inner_microstep: 5107.02 | bwd_allreduce_microstep: 80.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-31 07:26:30,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.78 [2024-07-31 07:26:30,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.99 | bwd_microstep: 4906.61 | bwd_inner_microstep: 4883.33 | bwd_allreduce_microstep: 23.21 | step_microstep: 181.73 [2024-07-31 07:26:30,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28510.29 | bwd: 41273.50 | bwd_inner: 39793.94 | bwd_allreduce: 1479.07 | step: 182.44 22%|██▏ | 267/1230 [5:14:36<18:46:20, 70.18s/it] {'loss': 1.2535, 'learning_rate': 1.8221198878801415e-05, 'epoch': 0.22} 22%|██▏ | 267/1230 [5:14:36<18:46:20, 70.18s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 07:26:38,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3316.15 | bwd_microstep: 5254.71 | bwd_inner_microstep: 5161.25 | bwd_allreduce_microstep: 93.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3786 [2024-07-31 07:26:47,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3804.27 | bwd_microstep: 5198.82 | bwd_inner_microstep: 5160.35 | bwd_allreduce_microstep: 38.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-31 07:26:56,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.06 | bwd_microstep: 5209.96 | bwd_inner_microstep: 4805.96 | bwd_allreduce_microstep: 403.93 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3781 [2024-07-31 07:27:05,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.81 | bwd_microstep: 4930.32 | bwd_inner_microstep: 4911.00 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-31 07:27:13,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.12 | bwd_microstep: 5117.14 | bwd_inner_microstep: 5070.95 | bwd_allreduce_microstep: 46.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 07:27:22,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.16 | bwd_microstep: 5108.20 | bwd_inner_microstep: 5038.84 | bwd_allreduce_microstep: 69.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 07:27:31,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.66 | bwd_microstep: 4892.34 | bwd_inner_microstep: 4873.06 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 07:27:39,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 07:27:39,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.25 | bwd_microstep: 4997.24 | bwd_inner_microstep: 4948.84 | bwd_allreduce_microstep: 48.34 | step_microstep: 181.34 [2024-07-31 07:27:39,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28517.39 | bwd: 40708.73 | bwd_inner: 39970.19 | bwd_allreduce: 738.05 | step: 181.92 22%|██▏ | 268/1230 [5:15:45<18:42:13, 69.99s/it] {'loss': 1.2258, 'learning_rate': 1.8206178320980295e-05, 'epoch': 0.22} 22%|██▏ | 268/1230 [5:15:45<18:42:13, 69.99s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3970 [2024-07-31 07:27:49,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.84 | bwd_microstep: 5576.12 | bwd_inner_microstep: 5505.35 | bwd_allreduce_microstep: 70.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3600 [2024-07-31 07:27:58,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.13 | bwd_microstep: 5254.17 | bwd_inner_microstep: 5124.81 | bwd_allreduce_microstep: 129.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-31 07:28:06,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.40 | bwd_microstep: 5028.73 | bwd_inner_microstep: 5003.65 | bwd_allreduce_microstep: 25.00 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2076 [2024-07-31 07:28:15,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.59 | bwd_microstep: 5169.96 | bwd_inner_microstep: 4770.08 | bwd_allreduce_microstep: 399.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 07:28:23,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.69 | bwd_microstep: 4714.94 | bwd_inner_microstep: 4691.80 | bwd_allreduce_microstep: 23.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 07:28:32,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.70 | bwd_microstep: 5001.56 | bwd_inner_microstep: 4948.07 | bwd_allreduce_microstep: 53.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 07:28:40,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.29 | bwd_microstep: 5074.03 | bwd_inner_microstep: 5014.24 | bwd_allreduce_microstep: 59.71 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 07:28:49,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 07:28:49,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.92 | bwd_microstep: 5061.76 | bwd_inner_microstep: 5004.24 | bwd_allreduce_microstep: 57.44 | step_microstep: 182.86 [2024-07-31 07:28:49,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28570.46 | bwd: 40881.24 | bwd_inner: 40062.17 | bwd_allreduce: 818.57 | step: 183.45 22%|██▏ | 269/1230 [5:16:55<18:40:03, 69.93s/it] {'loss': 1.197, 'learning_rate': 1.819110085696295e-05, 'epoch': 0.22} 22%|██▏ | 269/1230 [5:16:55<18:40:03, 69.93s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 07:28:58,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.51 | bwd_microstep: 5478.20 | bwd_inner_microstep: 5364.08 | bwd_allreduce_microstep: 114.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3806 [2024-07-31 07:29:07,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.63 | bwd_microstep: 5207.99 | bwd_inner_microstep: 5151.23 | bwd_allreduce_microstep: 56.70 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3812 [2024-07-31 07:29:16,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.19 | bwd_microstep: 5037.41 | bwd_inner_microstep: 5018.00 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3784 [2024-07-31 07:29:25,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.83 | bwd_microstep: 5031.60 | bwd_inner_microstep: 5012.14 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 07:29:33,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.33 | bwd_microstep: 5126.00 | bwd_inner_microstep: 5080.11 | bwd_allreduce_microstep: 45.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 07:29:42,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.28 | bwd_microstep: 5060.07 | bwd_inner_microstep: 4995.06 | bwd_allreduce_microstep: 64.94 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-07-31 07:29:51,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.84 | bwd_microstep: 4884.44 | bwd_inner_microstep: 4865.15 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 07:29:59,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 07:29:59,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3228.94 | bwd_microstep: 4853.37 | bwd_inner_microstep: 4809.23 | bwd_allreduce_microstep: 44.07 | step_microstep: 181.40 [2024-07-31 07:29:59,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28943.47 | bwd: 40679.06 | bwd_inner: 40294.93 | bwd_allreduce: 383.64 | step: 182.09 22%|██▏ | 270/1230 [5:18:05<18:39:02, 69.94s/it] {'loss': 1.214, 'learning_rate': 1.817596659130489e-05, 'epoch': 0.22} 22%|██▏ | 270/1230 [5:18:05<18:39:02, 69.94s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4072 [2024-07-31 07:30:08,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3351.88 | bwd_microstep: 5148.00 | bwd_inner_microstep: 5128.96 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 07:30:16,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.97 | bwd_microstep: 5091.63 | bwd_inner_microstep: 5056.67 | bwd_allreduce_microstep: 34.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3793 [2024-07-31 07:30:25,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.10 | bwd_microstep: 5084.00 | bwd_inner_microstep: 5063.49 | bwd_allreduce_microstep: 20.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3772 [2024-07-31 07:30:34,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.38 | bwd_microstep: 5156.75 | bwd_inner_microstep: 5109.10 | bwd_allreduce_microstep: 47.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-31 07:30:43,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.57 | bwd_microstep: 5215.15 | bwd_inner_microstep: 4808.93 | bwd_allreduce_microstep: 406.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 07:30:51,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.79 | bwd_microstep: 5064.05 | bwd_inner_microstep: 5003.50 | bwd_allreduce_microstep: 60.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 07:31:00,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.16 | bwd_microstep: 4987.51 | bwd_inner_microstep: 4951.80 | bwd_allreduce_microstep: 35.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 07:31:08,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 07:31:08,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3008.99 | bwd_microstep: 4899.65 | bwd_inner_microstep: 4522.09 | bwd_allreduce_microstep: 377.49 | step_microstep: 181.82 [2024-07-31 07:31:08,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28196.73 | bwd: 40646.72 | bwd_inner: 39644.50 | bwd_allreduce: 1001.73 | step: 182.39 22%|██▏ | 271/1230 [5:19:14<18:34:13, 69.71s/it] {'loss': 1.2129, 'learning_rate': 1.816077562895551e-05, 'epoch': 0.22} 22%|██▏ | 271/1230 [5:19:14<18:34:13, 69.71s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 07:31:17,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3879.09 | bwd_microstep: 5393.23 | bwd_inner_microstep: 5374.11 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4058 [2024-07-31 07:31:26,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.97 | bwd_microstep: 5147.60 | bwd_inner_microstep: 5128.29 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3799 [2024-07-31 07:31:35,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.74 | bwd_microstep: 5184.08 | bwd_inner_microstep: 5118.00 | bwd_allreduce_microstep: 66.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-31 07:31:44,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.25 | bwd_microstep: 5187.90 | bwd_inner_microstep: 4786.10 | bwd_allreduce_microstep: 401.73 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-31 07:31:53,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.95 | bwd_microstep: 5171.36 | bwd_inner_microstep: 5101.78 | bwd_allreduce_microstep: 69.51 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2173 [2024-07-31 07:32:01,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.26 | bwd_microstep: 5081.91 | bwd_inner_microstep: 4689.97 | bwd_allreduce_microstep: 391.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 07:32:10,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.38 | bwd_microstep: 5191.71 | bwd_inner_microstep: 5137.33 | bwd_allreduce_microstep: 54.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 07:32:19,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 07:32:19,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.59 | bwd_microstep: 5063.15 | bwd_inner_microstep: 5003.45 | bwd_allreduce_microstep: 59.64 | step_microstep: 182.61 [2024-07-31 07:32:19,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29017.13 | bwd: 41420.93 | bwd_inner: 40338.97 | bwd_allreduce: 1081.47 | step: 183.20 22%|██▏ | 272/1230 [5:20:25<18:38:08, 70.03s/it] {'loss': 1.1589, 'learning_rate': 1.814552807525738e-05, 'epoch': 0.22} 22%|██▏ | 272/1230 [5:20:25<18:38:08, 70.03s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3835 [2024-07-31 07:32:28,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.45 | bwd_microstep: 5637.00 | bwd_inner_microstep: 5477.00 | bwd_allreduce_microstep: 159.93 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2277 [2024-07-31 07:32:37,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.20 | bwd_microstep: 5296.15 | bwd_inner_microstep: 4885.74 | bwd_allreduce_microstep: 410.35 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3852 [2024-07-31 07:32:46,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.79 | bwd_microstep: 5223.93 | bwd_inner_microstep: 5170.79 | bwd_allreduce_microstep: 53.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-31 07:32:54,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3240.59 | bwd_microstep: 4849.39 | bwd_inner_microstep: 4824.81 | bwd_allreduce_microstep: 24.51 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-31 07:33:02,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.88 | bwd_microstep: 4808.77 | bwd_inner_microstep: 4788.60 | bwd_allreduce_microstep: 20.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-07-31 07:33:11,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.30 | bwd_microstep: 5036.50 | bwd_inner_microstep: 5016.00 | bwd_allreduce_microstep: 20.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 07:33:19,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.71 | bwd_microstep: 4847.64 | bwd_inner_microstep: 4802.31 | bwd_allreduce_microstep: 45.26 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2201 [2024-07-31 07:33:28,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 07:33:28,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.84 | bwd_microstep: 5036.67 | bwd_inner_microstep: 4644.44 | bwd_allreduce_microstep: 392.16 | step_microstep: 181.66 [2024-07-31 07:33:28,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27858.64 | bwd: 40736.04 | bwd_inner: 39609.63 | bwd_allreduce: 1125.92 | step: 182.36 22%|██▏ | 273/1230 [5:21:34<18:31:44, 69.70s/it] {'loss': 1.2022, 'learning_rate': 1.81302240359455e-05, 'epoch': 0.22} 22%|██▏ | 273/1230 [5:21:34<18:31:44, 69.70s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4025 [2024-07-31 07:33:37,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3854.67 | bwd_microstep: 5247.86 | bwd_inner_microstep: 5228.81 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3851 [2024-07-31 07:33:46,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.47 | bwd_microstep: 5063.32 | bwd_inner_microstep: 5027.10 | bwd_allreduce_microstep: 36.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3853 [2024-07-31 07:33:55,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3791.36 | bwd_microstep: 5109.01 | bwd_inner_microstep: 5089.61 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-31 07:34:03,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.51 | bwd_microstep: 5166.28 | bwd_inner_microstep: 4764.43 | bwd_allreduce_microstep: 401.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3634 [2024-07-31 07:34:12,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.72 | bwd_microstep: 5173.10 | bwd_inner_microstep: 5080.91 | bwd_allreduce_microstep: 92.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 07:34:21,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.15 | bwd_microstep: 5057.69 | bwd_inner_microstep: 4996.80 | bwd_allreduce_microstep: 60.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-31 07:34:29,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.68 | bwd_microstep: 4891.94 | bwd_inner_microstep: 4872.56 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 07:34:38,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 07:34:38,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.13 | bwd_microstep: 4969.53 | bwd_inner_microstep: 4919.97 | bwd_allreduce_microstep: 49.49 | step_microstep: 182.64 [2024-07-31 07:34:38,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29194.59 | bwd: 40678.70 | bwd_inner: 39980.13 | bwd_allreduce: 698.07 | step: 183.21 22%|██▏ | 274/1230 [5:22:44<18:32:59, 69.85s/it] {'loss': 1.2463, 'learning_rate': 1.8114863617146576e-05, 'epoch': 0.22} 22%|██▏ | 274/1230 [5:22:44<18:32:59, 69.85s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 07:34:47,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.83 | bwd_microstep: 5472.23 | bwd_inner_microstep: 5356.73 | bwd_allreduce_microstep: 115.44 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2061 [2024-07-31 07:34:56,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.58 | bwd_microstep: 5243.81 | bwd_inner_microstep: 4841.62 | bwd_allreduce_microstep: 402.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3770 [2024-07-31 07:35:05,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.20 | bwd_microstep: 5295.88 | bwd_inner_microstep: 5220.03 | bwd_allreduce_microstep: 75.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3858 [2024-07-31 07:35:14,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.10 | bwd_microstep: 5095.88 | bwd_inner_microstep: 5076.59 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3655 [2024-07-31 07:35:23,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.97 | bwd_microstep: 5076.66 | bwd_inner_microstep: 5032.89 | bwd_allreduce_microstep: 43.71 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 07:35:32,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.17 | bwd_microstep: 5167.97 | bwd_inner_microstep: 5081.50 | bwd_allreduce_microstep: 86.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-31 07:35:40,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3024.89 | bwd_microstep: 4961.44 | bwd_inner_microstep: 4579.28 | bwd_allreduce_microstep: 382.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-31 07:35:48,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 07:35:48,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3439.72 | bwd_microstep: 5007.30 | bwd_inner_microstep: 4620.40 | bwd_allreduce_microstep: 386.83 | step_microstep: 181.74 [2024-07-31 07:35:48,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28468.37 | bwd: 41321.16 | bwd_inner: 39808.97 | bwd_allreduce: 1511.69 | step: 182.33 22%|██▏ | 275/1230 [5:23:54<18:33:06, 69.93s/it] {'loss': 1.2163, 'learning_rate': 1.8099446925378278e-05, 'epoch': 0.22} 22%|██▏ | 275/1230 [5:23:54<18:33:06, 69.93s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3959 [2024-07-31 07:35:57,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.02 | bwd_microstep: 5301.44 | bwd_inner_microstep: 5233.78 | bwd_allreduce_microstep: 67.58 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-31 07:36:06,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.12 | bwd_microstep: 5295.27 | bwd_inner_microstep: 5201.22 | bwd_allreduce_microstep: 93.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3833 [2024-07-31 07:36:14,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3235.13 | bwd_microstep: 4864.82 | bwd_inner_microstep: 4845.50 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2115 [2024-07-31 07:36:23,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.31 | bwd_microstep: 5271.44 | bwd_inner_microstep: 4862.43 | bwd_allreduce_microstep: 408.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 07:36:32,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3441.12 | bwd_microstep: 5101.29 | bwd_inner_microstep: 5039.07 | bwd_allreduce_microstep: 62.15 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 07:36:40,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.24 | bwd_microstep: 5127.75 | bwd_inner_microstep: 5054.66 | bwd_allreduce_microstep: 73.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 07:36:49,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.46 | bwd_microstep: 4955.13 | bwd_inner_microstep: 4920.36 | bwd_allreduce_microstep: 34.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-31 07:36:58,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 07:36:58,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.31 | bwd_microstep: 4890.64 | bwd_inner_microstep: 4871.27 | bwd_allreduce_microstep: 19.30 | step_microstep: 182.22 [2024-07-31 07:36:58,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28395.63 | bwd: 40807.77 | bwd_inner: 40028.23 | bwd_allreduce: 779.04 | step: 182.90 22%|██▏ | 276/1230 [5:25:04<18:30:02, 69.81s/it] {'loss': 1.2308, 'learning_rate': 1.8083974067548506e-05, 'epoch': 0.22} 22%|██▏ | 276/1230 [5:25:04<18:30:02, 69.81s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3823 [2024-07-31 07:37:07,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.33 | bwd_microstep: 5411.42 | bwd_inner_microstep: 5328.73 | bwd_allreduce_microstep: 82.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3556 [2024-07-31 07:37:16,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.61 | bwd_microstep: 5125.14 | bwd_inner_microstep: 5044.87 | bwd_allreduce_microstep: 80.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3778 [2024-07-31 07:37:24,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.68 | bwd_microstep: 5054.41 | bwd_inner_microstep: 5033.41 | bwd_allreduce_microstep: 20.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 07:37:33,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3376.63 | bwd_microstep: 4807.60 | bwd_inner_microstep: 4775.27 | bwd_allreduce_microstep: 32.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 07:37:41,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.19 | bwd_microstep: 4918.99 | bwd_inner_microstep: 4896.05 | bwd_allreduce_microstep: 22.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-31 07:37:50,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.04 | bwd_microstep: 4995.77 | bwd_inner_microstep: 4976.43 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 07:37:59,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.46 | bwd_microstep: 5098.20 | bwd_inner_microstep: 5036.68 | bwd_allreduce_microstep: 61.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3680 [2024-07-31 07:38:08,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 07:38:08,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.20 | bwd_microstep: 4886.81 | bwd_inner_microstep: 4867.41 | bwd_allreduce_microstep: 19.33 | step_microstep: 182.02 [2024-07-31 07:38:08,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29141.02 | bwd: 40298.34 | bwd_inner: 39958.81 | bwd_allreduce: 339.04 | step: 182.60 23%|██▎ | 277/1230 [5:26:13<18:28:41, 69.80s/it] {'loss': 1.127, 'learning_rate': 1.806844515095465e-05, 'epoch': 0.23} 23%|██▎ | 277/1230 [5:26:13<18:28:41, 69.80s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-31 07:38:16,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.91 | bwd_microstep: 5180.62 | bwd_inner_microstep: 5132.04 | bwd_allreduce_microstep: 48.52 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3711 [2024-07-31 07:38:25,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.33 | bwd_microstep: 5165.51 | bwd_inner_microstep: 5100.81 | bwd_allreduce_microstep: 64.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3765 [2024-07-31 07:38:34,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.79 | bwd_microstep: 5188.41 | bwd_inner_microstep: 5132.90 | bwd_allreduce_microstep: 55.45 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3601 [2024-07-31 07:38:42,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.97 | bwd_microstep: 4818.61 | bwd_inner_microstep: 4792.51 | bwd_allreduce_microstep: 26.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3606 [2024-07-31 07:38:51,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.72 | bwd_microstep: 5136.49 | bwd_inner_microstep: 5057.72 | bwd_allreduce_microstep: 78.70 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2151 [2024-07-31 07:39:00,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3476.81 | bwd_microstep: 5057.68 | bwd_inner_microstep: 4662.42 | bwd_allreduce_microstep: 395.19 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 07:39:08,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.11 | bwd_microstep: 5062.49 | bwd_inner_microstep: 4996.89 | bwd_allreduce_microstep: 65.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 07:39:17,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 07:39:17,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.35 | bwd_microstep: 4995.76 | bwd_inner_microstep: 4938.47 | bwd_allreduce_microstep: 57.22 | step_microstep: 181.70 [2024-07-31 07:39:17,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28500.91 | bwd: 40605.54 | bwd_inner: 39813.69 | bwd_allreduce: 791.37 | step: 182.29 23%|██▎ | 278/1230 [5:27:23<18:25:47, 69.69s/it] {'loss': 1.2639, 'learning_rate': 1.8052860283282832e-05, 'epoch': 0.23} 23%|██▎ | 278/1230 [5:27:23<18:25:47, 69.69s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3836 [2024-07-31 07:39:26,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.29 | bwd_microstep: 5264.29 | bwd_inner_microstep: 5206.88 | bwd_allreduce_microstep: 57.34 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3831 [2024-07-31 07:39:35,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.02 | bwd_microstep: 5230.55 | bwd_inner_microstep: 5186.48 | bwd_allreduce_microstep: 44.00 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-07-31 07:39:43,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.73 | bwd_microstep: 4883.95 | bwd_inner_microstep: 4828.85 | bwd_allreduce_microstep: 55.03 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2121 [2024-07-31 07:39:52,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.55 | bwd_microstep: 5202.00 | bwd_inner_microstep: 4796.35 | bwd_allreduce_microstep: 405.58 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 07:40:00,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.82 | bwd_microstep: 5112.31 | bwd_inner_microstep: 5046.41 | bwd_allreduce_microstep: 65.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-31 07:40:09,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.66 | bwd_microstep: 5218.65 | bwd_inner_microstep: 4812.66 | bwd_allreduce_microstep: 405.92 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2155 [2024-07-31 07:40:18,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3445.34 | bwd_microstep: 5030.68 | bwd_inner_microstep: 4641.60 | bwd_allreduce_microstep: 389.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2119 [2024-07-31 07:40:27,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 07:40:27,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.92 | bwd_microstep: 5181.10 | bwd_inner_microstep: 4776.81 | bwd_allreduce_microstep: 404.22 | step_microstep: 183.72 [2024-07-31 07:40:27,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28150.22 | bwd: 41123.51 | bwd_inner: 39295.97 | bwd_allreduce: 1827.05 | step: 184.42 23%|██▎ | 279/1230 [5:28:32<18:24:13, 69.67s/it] {'loss': 1.2254, 'learning_rate': 1.8037219572607177e-05, 'epoch': 0.23} 23%|██▎ | 279/1230 [5:28:32<18:24:13, 69.67s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3784 [2024-07-31 07:40:36,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.48 | bwd_microstep: 5365.68 | bwd_inner_microstep: 5269.11 | bwd_allreduce_microstep: 96.50 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3815 [2024-07-31 07:40:44,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.59 | bwd_microstep: 5216.25 | bwd_inner_microstep: 5161.37 | bwd_allreduce_microstep: 54.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3581 [2024-07-31 07:40:53,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.26 | bwd_microstep: 5187.29 | bwd_inner_microstep: 5098.53 | bwd_allreduce_microstep: 88.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 07:41:02,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.06 | bwd_microstep: 5147.34 | bwd_inner_microstep: 5083.55 | bwd_allreduce_microstep: 63.73 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3716 [2024-07-31 07:41:10,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3321.28 | bwd_microstep: 4988.34 | bwd_inner_microstep: 4942.21 | bwd_allreduce_microstep: 46.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 07:41:18,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3211.87 | bwd_microstep: 4839.81 | bwd_inner_microstep: 4796.55 | bwd_allreduce_microstep: 43.20 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1162 [2024-07-31 07:41:27,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3452.57 | bwd_microstep: 5083.43 | bwd_inner_microstep: 4691.76 | bwd_allreduce_microstep: 391.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 07:41:36,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 07:41:36,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.52 | bwd_microstep: 5041.52 | bwd_inner_microstep: 4983.81 | bwd_allreduce_microstep: 57.65 | step_microstep: 182.73 [2024-07-31 07:41:36,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28005.53 | bwd: 40869.66 | bwd_inner: 40026.83 | bwd_allreduce: 842.35 | step: 183.31 23%|██▎ | 280/1230 [5:29:42<18:20:53, 69.53s/it] {'loss': 1.2645, 'learning_rate': 1.8021523127389066e-05, 'epoch': 0.23} 23%|██▎ | 280/1230 [5:29:42<18:20:53, 69.53s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3949 [2024-07-31 07:41:45,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3809.91 | bwd_microstep: 5226.92 | bwd_inner_microstep: 5207.82 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4048 [2024-07-31 07:41:53,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3332.64 | bwd_microstep: 5122.79 | bwd_inner_microstep: 5103.47 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-31 07:42:02,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3455.81 | bwd_microstep: 5149.45 | bwd_inner_microstep: 5100.22 | bwd_allreduce_microstep: 49.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-31 07:42:11,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.70 | bwd_microstep: 5041.27 | bwd_inner_microstep: 5017.52 | bwd_allreduce_microstep: 23.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3622 [2024-07-31 07:42:20,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.60 | bwd_microstep: 5192.49 | bwd_inner_microstep: 5097.02 | bwd_allreduce_microstep: 95.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-31 07:42:28,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3241.05 | bwd_microstep: 4838.43 | bwd_inner_microstep: 4810.90 | bwd_allreduce_microstep: 27.46 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 07:42:36,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.02 | bwd_microstep: 5021.65 | bwd_inner_microstep: 4985.47 | bwd_allreduce_microstep: 36.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3669 [2024-07-31 07:42:45,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.74 [2024-07-31 07:42:45,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.67 | bwd_microstep: 4915.22 | bwd_inner_microstep: 4889.04 | bwd_allreduce_microstep: 26.12 | step_microstep: 181.40 [2024-07-31 07:42:45,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28680.31 | bwd: 40508.21 | bwd_inner: 40211.41 | bwd_allreduce: 296.31 | step: 181.98 23%|██▎ | 281/1230 [5:30:51<18:19:42, 69.53s/it] {'loss': 1.1495, 'learning_rate': 1.800577105647635e-05, 'epoch': 0.23} 23%|██▎ | 281/1230 [5:30:51<18:19:42, 69.53s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3933 [2024-07-31 07:42:54,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.17 | bwd_microstep: 5412.34 | bwd_inner_microstep: 5329.61 | bwd_allreduce_microstep: 82.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3580 [2024-07-31 07:43:04,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.05 | bwd_microstep: 5465.04 | bwd_inner_microstep: 5286.51 | bwd_allreduce_microstep: 178.46 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2148 [2024-07-31 07:43:12,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.72 | bwd_microstep: 5269.25 | bwd_inner_microstep: 4861.25 | bwd_allreduce_microstep: 407.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2231 [2024-07-31 07:43:21,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.00 | bwd_microstep: 5179.47 | bwd_inner_microstep: 4776.62 | bwd_allreduce_microstep: 402.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3734 [2024-07-31 07:43:30,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.76 | bwd_microstep: 5040.20 | bwd_inner_microstep: 4999.15 | bwd_allreduce_microstep: 40.98 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-07-31 07:43:39,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.60 | bwd_microstep: 5172.34 | bwd_inner_microstep: 4770.93 | bwd_allreduce_microstep: 401.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 07:43:47,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.48 | bwd_microstep: 5119.38 | bwd_inner_microstep: 4721.24 | bwd_allreduce_microstep: 398.07 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 07:43:56,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 07:43:56,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.75 | bwd_microstep: 5011.13 | bwd_inner_microstep: 4961.16 | bwd_allreduce_microstep: 49.90 | step_microstep: 181.95 [2024-07-31 07:43:56,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28701.43 | bwd: 41669.12 | bwd_inner: 39706.40 | bwd_allreduce: 1962.21 | step: 182.65 23%|██▎ | 282/1230 [5:32:02<18:24:05, 69.88s/it] {'loss': 1.1708, 'learning_rate': 1.7989963469102643e-05, 'epoch': 0.23} 23%|██▎ | 282/1230 [5:32:02<18:24:05, 69.88s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 07:44:05,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.66 | bwd_microstep: 5390.30 | bwd_inner_microstep: 5290.19 | bwd_allreduce_microstep: 100.03 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3897 [2024-07-31 07:44:14,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.40 | bwd_microstep: 5308.11 | bwd_inner_microstep: 5247.58 | bwd_allreduce_microstep: 60.46 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2063 [2024-07-31 07:44:23,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.42 | bwd_microstep: 5294.99 | bwd_inner_microstep: 4885.35 | bwd_allreduce_microstep: 409.57 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 07:44:32,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.24 | bwd_microstep: 5191.73 | bwd_inner_microstep: 5116.73 | bwd_allreduce_microstep: 74.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-31 07:44:41,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.06 | bwd_microstep: 5141.71 | bwd_inner_microstep: 5067.02 | bwd_allreduce_microstep: 74.63 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3189 [2024-07-31 07:44:49,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.45 | bwd_microstep: 5055.86 | bwd_inner_microstep: 4869.70 | bwd_allreduce_microstep: 186.09 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3703 [2024-07-31 07:44:57,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3430.75 | bwd_microstep: 4826.12 | bwd_inner_microstep: 4800.35 | bwd_allreduce_microstep: 25.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3700 [2024-07-31 07:45:06,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 07:45:06,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.28 | bwd_microstep: 5048.06 | bwd_inner_microstep: 4977.79 | bwd_allreduce_microstep: 70.20 | step_microstep: 182.09 [2024-07-31 07:45:06,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28672.18 | bwd: 41256.85 | bwd_inner: 40254.65 | bwd_allreduce: 1001.72 | step: 182.70 23%|██▎ | 283/1230 [5:33:12<18:24:44, 69.99s/it] {'loss': 1.2124, 'learning_rate': 1.797410047488653e-05, 'epoch': 0.23} 23%|██▎ | 283/1230 [5:33:12<18:24:44, 69.99s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2410 [2024-07-31 07:45:15,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.14 | bwd_microstep: 5389.94 | bwd_inner_microstep: 4975.37 | bwd_allreduce_microstep: 414.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3854 [2024-07-31 07:45:24,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.62 | bwd_microstep: 5140.85 | bwd_inner_microstep: 5118.24 | bwd_allreduce_microstep: 22.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3578 [2024-07-31 07:45:33,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.92 | bwd_microstep: 5114.17 | bwd_inner_microstep: 5036.50 | bwd_allreduce_microstep: 77.60 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2080 [2024-07-31 07:45:42,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.62 | bwd_microstep: 5144.70 | bwd_inner_microstep: 4747.05 | bwd_allreduce_microstep: 397.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-31 07:45:50,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.95 | bwd_microstep: 5011.49 | bwd_inner_microstep: 4985.16 | bwd_allreduce_microstep: 26.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 07:45:59,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.78 | bwd_microstep: 5066.84 | bwd_inner_microstep: 4999.42 | bwd_allreduce_microstep: 67.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 07:46:08,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.25 | bwd_microstep: 4897.43 | bwd_inner_microstep: 4878.08 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3650 [2024-07-31 07:46:17,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 07:46:17,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.61 | bwd_microstep: 5135.30 | bwd_inner_microstep: 5051.03 | bwd_allreduce_microstep: 84.19 | step_microstep: 181.95 [2024-07-31 07:46:17,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29050.81 | bwd: 40900.68 | bwd_inner: 39790.79 | bwd_allreduce: 1109.38 | step: 182.54 23%|██▎ | 284/1230 [5:34:22<18:24:56, 70.08s/it] {'loss': 1.2583, 'learning_rate': 1.7958182183830816e-05, 'epoch': 0.23} 23%|██▎ | 284/1230 [5:34:22<18:24:56, 70.08s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 07:46:26,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3861.65 | bwd_microstep: 5357.45 | bwd_inner_microstep: 5338.39 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3580 [2024-07-31 07:46:35,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.24 | bwd_microstep: 5211.85 | bwd_inner_microstep: 5117.07 | bwd_allreduce_microstep: 94.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3912 [2024-07-31 07:46:43,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3299.56 | bwd_microstep: 4965.17 | bwd_inner_microstep: 4945.85 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-07-31 07:46:52,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.35 | bwd_microstep: 5005.53 | bwd_inner_microstep: 4986.03 | bwd_allreduce_microstep: 19.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 07:47:00,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.37 | bwd_microstep: 5128.84 | bwd_inner_microstep: 5059.59 | bwd_allreduce_microstep: 69.18 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2119 [2024-07-31 07:47:09,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3447.74 | bwd_microstep: 5039.92 | bwd_inner_microstep: 4650.29 | bwd_allreduce_microstep: 389.56 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3650 [2024-07-31 07:47:18,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.94 | bwd_microstep: 5182.45 | bwd_inner_microstep: 5084.74 | bwd_allreduce_microstep: 97.64 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 07:47:26,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.94 [2024-07-31 07:47:26,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.05 | bwd_microstep: 4886.52 | bwd_inner_microstep: 4867.21 | bwd_allreduce_microstep: 19.25 | step_microstep: 182.55 [2024-07-31 07:47:26,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28829.78 | bwd: 40777.72 | bwd_inner: 40049.11 | bwd_allreduce: 728.11 | step: 183.14 23%|██▎ | 285/1230 [5:35:32<18:23:06, 70.04s/it] {'loss': 1.2181, 'learning_rate': 1.794220870632177e-05, 'epoch': 0.23} 23%|██▎ | 285/1230 [5:35:32<18:23:06, 70.04s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3940 [2024-07-31 07:47:36,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.11 | bwd_microstep: 5405.71 | bwd_inner_microstep: 5341.12 | bwd_allreduce_microstep: 64.52 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3956 [2024-07-31 07:47:45,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3797.00 | bwd_microstep: 5194.72 | bwd_inner_microstep: 5175.39 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-31 07:47:53,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.99 | bwd_microstep: 5197.52 | bwd_inner_microstep: 5115.95 | bwd_allreduce_microstep: 81.50 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3630 [2024-07-31 07:48:02,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.90 | bwd_microstep: 5140.51 | bwd_inner_microstep: 5054.37 | bwd_allreduce_microstep: 86.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-31 07:48:11,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.77 | bwd_microstep: 5197.00 | bwd_inner_microstep: 5140.88 | bwd_allreduce_microstep: 56.06 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-31 07:48:20,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.77 | bwd_microstep: 5070.64 | bwd_inner_microstep: 5029.90 | bwd_allreduce_microstep: 40.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 07:48:29,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.89 | bwd_microstep: 5159.40 | bwd_inner_microstep: 5085.04 | bwd_allreduce_microstep: 74.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-31 07:48:37,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 07:48:37,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.75 | bwd_microstep: 5104.44 | bwd_inner_microstep: 4710.42 | bwd_allreduce_microstep: 393.95 | step_microstep: 181.61 [2024-07-31 07:48:37,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29077.08 | bwd: 41469.92 | bwd_inner: 40653.01 | bwd_allreduce: 816.42 | step: 182.30 23%|██▎ | 286/1230 [5:36:43<18:25:54, 70.29s/it] {'loss': 1.2052, 'learning_rate': 1.7926180153128358e-05, 'epoch': 0.23} 23%|██▎ | 286/1230 [5:36:43<18:25:54, 70.29s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2350 [2024-07-31 07:48:46,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.42 | bwd_microstep: 5343.80 | bwd_inner_microstep: 4930.78 | bwd_allreduce_microstep: 412.96 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2837 [2024-07-31 07:48:55,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.29 | bwd_microstep: 5213.21 | bwd_inner_microstep: 4806.34 | bwd_allreduce_microstep: 406.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3787 [2024-07-31 07:49:04,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.55 | bwd_microstep: 5177.81 | bwd_inner_microstep: 5123.80 | bwd_allreduce_microstep: 53.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3855 [2024-07-31 07:49:13,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.34 | bwd_microstep: 5113.28 | bwd_inner_microstep: 5093.92 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-31 07:49:21,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.30 | bwd_microstep: 4885.77 | bwd_inner_microstep: 4866.32 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3749 [2024-07-31 07:49:30,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.21 | bwd_microstep: 5172.21 | bwd_inner_microstep: 5093.42 | bwd_allreduce_microstep: 78.73 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-31 07:49:39,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.88 | bwd_microstep: 5157.58 | bwd_inner_microstep: 4757.08 | bwd_allreduce_microstep: 400.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 07:49:48,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 07:49:48,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.62 | bwd_microstep: 5176.29 | bwd_inner_microstep: 5095.15 | bwd_allreduce_microstep: 81.07 | step_microstep: 181.34 [2024-07-31 07:49:48,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29035.51 | bwd: 41239.93 | bwd_inner: 39766.74 | bwd_allreduce: 1472.70 | step: 181.91 23%|██▎ | 287/1230 [5:37:54<18:26:12, 70.38s/it] {'loss': 1.2095, 'learning_rate': 1.791009663540146e-05, 'epoch': 0.23} 23%|██▎ | 287/1230 [5:37:54<18:26:12, 70.38s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3887 [2024-07-31 07:49:57,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3862.86 | bwd_microstep: 5429.52 | bwd_inner_microstep: 5370.92 | bwd_allreduce_microstep: 58.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3796 [2024-07-31 07:50:06,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.20 | bwd_microstep: 5019.77 | bwd_inner_microstep: 4987.08 | bwd_allreduce_microstep: 32.62 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2299 [2024-07-31 07:50:14,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3053.80 | bwd_microstep: 5050.50 | bwd_inner_microstep: 4662.63 | bwd_allreduce_microstep: 387.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-31 07:50:22,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3052.63 | bwd_microstep: 5022.79 | bwd_inner_microstep: 4636.22 | bwd_allreduce_microstep: 386.49 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3746 [2024-07-31 07:50:31,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.76 | bwd_microstep: 5212.17 | bwd_inner_microstep: 5139.60 | bwd_allreduce_microstep: 72.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 07:50:40,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.64 | bwd_microstep: 5125.92 | bwd_inner_microstep: 5072.98 | bwd_allreduce_microstep: 52.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 07:50:49,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.40 | bwd_microstep: 4898.45 | bwd_inner_microstep: 4875.68 | bwd_allreduce_microstep: 22.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 07:50:57,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 07:50:57,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.57 | bwd_microstep: 4990.58 | bwd_inner_microstep: 4939.89 | bwd_allreduce_microstep: 50.63 | step_microstep: 181.66 [2024-07-31 07:50:57,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28211.77 | bwd: 40749.68 | bwd_inner: 39684.94 | bwd_allreduce: 1064.25 | step: 182.24 23%|██▎ | 288/1230 [5:39:03<18:19:53, 70.06s/it] {'loss': 1.2197, 'learning_rate': 1.7893958264673117e-05, 'epoch': 0.23} 23%|██▎ | 288/1230 [5:39:03<18:19:53, 70.06s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4055 [2024-07-31 07:51:06,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.31 | bwd_microstep: 5188.69 | bwd_inner_microstep: 5169.58 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2188 [2024-07-31 07:51:15,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.93 | bwd_microstep: 5280.08 | bwd_inner_microstep: 4871.83 | bwd_allreduce_microstep: 408.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 07:51:24,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.83 | bwd_microstep: 5183.89 | bwd_inner_microstep: 5144.26 | bwd_allreduce_microstep: 39.57 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 07:51:33,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.28 | bwd_microstep: 4992.60 | bwd_inner_microstep: 4973.17 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2112 [2024-07-31 07:51:42,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.90 | bwd_microstep: 5208.46 | bwd_inner_microstep: 4804.37 | bwd_allreduce_microstep: 404.02 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-07-31 07:51:50,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.00 | bwd_microstep: 4990.09 | bwd_inner_microstep: 4960.00 | bwd_allreduce_microstep: 30.02 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3663 [2024-07-31 07:51:59,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.01 | bwd_microstep: 4999.60 | bwd_inner_microstep: 4944.42 | bwd_allreduce_microstep: 55.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 07:52:08,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 07:52:08,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.97 | bwd_microstep: 5080.63 | bwd_inner_microstep: 5021.02 | bwd_allreduce_microstep: 59.55 | step_microstep: 182.35 [2024-07-31 07:52:08,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29103.15 | bwd: 40924.03 | bwd_inner: 39888.58 | bwd_allreduce: 1034.96 | step: 183.05 23%|██▎ | 289/1230 [5:40:14<18:20:09, 70.15s/it] {'loss': 1.2569, 'learning_rate': 1.7877765152855757e-05, 'epoch': 0.23} 23%|██▎ | 289/1230 [5:40:14<18:20:09, 70.15s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4096 [2024-07-31 07:52:17,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.15 | bwd_microstep: 5312.08 | bwd_inner_microstep: 5292.94 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2261 [2024-07-31 07:52:26,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.27 | bwd_microstep: 5212.07 | bwd_inner_microstep: 4808.10 | bwd_allreduce_microstep: 403.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2300 [2024-07-31 07:52:34,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.26 | bwd_microstep: 5232.00 | bwd_inner_microstep: 4825.15 | bwd_allreduce_microstep: 406.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-31 07:52:43,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.08 | bwd_microstep: 4979.95 | bwd_inner_microstep: 4960.62 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 07:52:52,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.91 | bwd_microstep: 5165.50 | bwd_inner_microstep: 5110.54 | bwd_allreduce_microstep: 54.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3650 [2024-07-31 07:53:01,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.31 | bwd_microstep: 4929.93 | bwd_inner_microstep: 4902.79 | bwd_allreduce_microstep: 27.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-31 07:53:09,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.74 | bwd_microstep: 5103.46 | bwd_inner_microstep: 4705.90 | bwd_allreduce_microstep: 397.49 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2130 [2024-07-31 07:53:18,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 07:53:18,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.96 | bwd_microstep: 5042.58 | bwd_inner_microstep: 4650.94 | bwd_allreduce_microstep: 391.57 | step_microstep: 181.55 [2024-07-31 07:53:18,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28915.57 | bwd: 40977.56 | bwd_inner: 39256.93 | bwd_allreduce: 1720.13 | step: 182.13 24%|██▎ | 290/1230 [5:41:24<18:19:20, 70.17s/it] {'loss': 1.2133, 'learning_rate': 1.78615174122414e-05, 'epoch': 0.24} 24%|██▎ | 290/1230 [5:41:24<18:19:20, 70.17s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3967 [2024-07-31 07:53:27,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.99 | bwd_microstep: 5456.58 | bwd_inner_microstep: 5384.07 | bwd_allreduce_microstep: 72.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2290 [2024-07-31 07:53:36,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.61 | bwd_microstep: 5181.83 | bwd_inner_microstep: 4777.70 | bwd_allreduce_microstep: 404.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-31 07:53:45,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.11 | bwd_microstep: 5319.24 | bwd_inner_microstep: 4906.27 | bwd_allreduce_microstep: 412.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-31 07:53:53,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.73 | bwd_microstep: 5174.98 | bwd_inner_microstep: 5090.61 | bwd_allreduce_microstep: 84.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 07:54:02,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.90 | bwd_microstep: 4987.15 | bwd_inner_microstep: 4952.62 | bwd_allreduce_microstep: 34.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 07:54:10,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3185.18 | bwd_microstep: 4695.04 | bwd_inner_microstep: 4672.40 | bwd_allreduce_microstep: 22.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 07:54:19,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.68 | bwd_microstep: 4912.43 | bwd_inner_microstep: 4888.02 | bwd_allreduce_microstep: 24.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 07:54:28,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 07:54:28,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.81 | bwd_microstep: 5067.56 | bwd_inner_microstep: 5006.14 | bwd_allreduce_microstep: 61.35 | step_microstep: 181.66 [2024-07-31 07:54:28,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28593.92 | bwd: 40794.79 | bwd_inner: 39677.76 | bwd_allreduce: 1116.55 | step: 182.24 24%|██▎ | 291/1230 [5:42:33<18:16:03, 70.04s/it] {'loss': 1.1737, 'learning_rate': 1.78452151555009e-05, 'epoch': 0.24} 24%|██▎ | 291/1230 [5:42:33<18:16:03, 70.04s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 07:54:37,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3867.78 | bwd_microstep: 5350.82 | bwd_inner_microstep: 5331.70 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3893 [2024-07-31 07:54:46,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.05 | bwd_microstep: 5205.35 | bwd_inner_microstep: 5158.09 | bwd_allreduce_microstep: 47.19 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-31 07:54:54,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3229.23 | bwd_microstep: 4826.22 | bwd_inner_microstep: 4806.94 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 07:55:02,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.55 | bwd_microstep: 5126.99 | bwd_inner_microstep: 5051.41 | bwd_allreduce_microstep: 75.52 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-31 07:55:11,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.55 | bwd_microstep: 5254.27 | bwd_inner_microstep: 4845.37 | bwd_allreduce_microstep: 408.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3637 [2024-07-31 07:55:20,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.11 | bwd_microstep: 5039.63 | bwd_inner_microstep: 4988.02 | bwd_allreduce_microstep: 51.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2135 [2024-07-31 07:55:29,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.07 | bwd_microstep: 5184.45 | bwd_inner_microstep: 4779.87 | bwd_allreduce_microstep: 404.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 07:55:38,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 07:55:38,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.21 | bwd_microstep: 4912.10 | bwd_inner_microstep: 4887.22 | bwd_allreduce_microstep: 24.82 | step_microstep: 183.23 [2024-07-31 07:55:38,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28858.46 | bwd: 40899.82 | bwd_inner: 39848.55 | bwd_allreduce: 1050.77 | step: 183.92 24%|██▎ | 292/1230 [5:43:44<18:15:10, 70.05s/it] {'loss': 1.1945, 'learning_rate': 1.7828858495683162e-05, 'epoch': 0.24} 24%|██▎ | 292/1230 [5:43:44<18:15:10, 70.05s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3780 [2024-07-31 07:55:47,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.54 | bwd_microstep: 5163.48 | bwd_inner_microstep: 5124.27 | bwd_allreduce_microstep: 39.15 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3908 [2024-07-31 07:55:55,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.44 | bwd_microstep: 5230.54 | bwd_inner_microstep: 5167.85 | bwd_allreduce_microstep: 62.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3786 [2024-07-31 07:56:04,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.21 | bwd_microstep: 5025.31 | bwd_inner_microstep: 5005.99 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 07:56:13,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.49 | bwd_microstep: 5126.17 | bwd_inner_microstep: 5046.04 | bwd_allreduce_microstep: 80.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-31 07:56:22,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.62 | bwd_microstep: 4900.84 | bwd_inner_microstep: 4881.51 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3709 [2024-07-31 07:56:30,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.34 | bwd_microstep: 4986.03 | bwd_inner_microstep: 4932.44 | bwd_allreduce_microstep: 53.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 07:56:38,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3175.32 | bwd_microstep: 4704.63 | bwd_inner_microstep: 4681.91 | bwd_allreduce_microstep: 22.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 07:56:47,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 07:56:47,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.92 | bwd_microstep: 5013.56 | bwd_inner_microstep: 4962.60 | bwd_allreduce_microstep: 50.89 | step_microstep: 181.83 [2024-07-31 07:56:47,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28638.78 | bwd: 40150.56 | bwd_inner: 39802.55 | bwd_allreduce: 347.51 | step: 182.41 24%|██▍ | 293/1230 [5:44:53<18:09:38, 69.77s/it] {'loss': 1.1835, 'learning_rate': 1.781244754621434e-05, 'epoch': 0.24} 24%|██▍ | 293/1230 [5:44:53<18:09:38, 69.77s/it]dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 4017 [2024-07-31 07:56:56,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3822.93 | bwd_microstep: 5242.02 | bwd_inner_microstep: 5219.12 | bwd_allreduce_microstep: 22.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3881 [2024-07-31 07:57:05,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.13 | bwd_microstep: 5345.73 | bwd_inner_microstep: 5279.15 | bwd_allreduce_microstep: 66.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3834 [2024-07-31 07:57:13,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3463.17 | bwd_microstep: 4912.93 | bwd_inner_microstep: 4889.21 | bwd_allreduce_microstep: 23.65 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3255 [2024-07-31 07:57:22,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.01 | bwd_microstep: 5169.18 | bwd_inner_microstep: 4999.83 | bwd_allreduce_microstep: 169.29 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2165 [2024-07-31 07:57:31,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.15 | bwd_microstep: 5279.94 | bwd_inner_microstep: 4871.57 | bwd_allreduce_microstep: 408.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 07:57:39,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3043.21 | bwd_microstep: 5024.18 | bwd_inner_microstep: 4634.46 | bwd_allreduce_microstep: 389.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 07:57:48,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.69 | bwd_microstep: 4915.44 | bwd_inner_microstep: 4890.80 | bwd_allreduce_microstep: 24.57 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2140 [2024-07-31 07:57:57,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 07:57:57,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.47 | bwd_microstep: 5200.10 | bwd_inner_microstep: 4793.91 | bwd_allreduce_microstep: 406.13 | step_microstep: 182.96 [2024-07-31 07:57:57,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28365.68 | bwd: 41089.50 | bwd_inner: 39578.00 | bwd_allreduce: 1511.03 | step: 183.54 24%|██▍ | 294/1230 [5:46:02<18:08:32, 69.78s/it] {'loss': 1.2762, 'learning_rate': 1.779598242089707e-05, 'epoch': 0.24} 24%|██▍ | 294/1230 [5:46:02<18:08:32, 69.78s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2567 [2024-07-31 07:58:06,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.51 | bwd_microstep: 5667.59 | bwd_inner_microstep: 5229.83 | bwd_allreduce_microstep: 437.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3799 [2024-07-31 07:58:15,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.77 | bwd_microstep: 5152.72 | bwd_inner_microstep: 5107.75 | bwd_allreduce_microstep: 44.90 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3834 [2024-07-31 07:58:24,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.24 | bwd_microstep: 5073.92 | bwd_inner_microstep: 5052.83 | bwd_allreduce_microstep: 21.02 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-07-31 07:58:32,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.61 | bwd_microstep: 5001.17 | bwd_inner_microstep: 4981.79 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3732 [2024-07-31 07:58:40,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3138.10 | bwd_microstep: 4843.13 | bwd_inner_microstep: 4812.95 | bwd_allreduce_microstep: 30.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 07:58:48,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3240.60 | bwd_microstep: 4860.12 | bwd_inner_microstep: 4818.06 | bwd_allreduce_microstep: 41.98 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 07:58:56,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3193.14 | bwd_microstep: 4688.28 | bwd_inner_microstep: 4667.78 | bwd_allreduce_microstep: 20.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-07-31 07:59:04,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 07:59:04,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3021.37 | bwd_microstep: 4900.52 | bwd_inner_microstep: 4522.37 | bwd_allreduce_microstep: 378.08 | step_microstep: 181.93 [2024-07-31 07:59:04,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27363.25 | bwd: 40187.44 | bwd_inner: 39193.31 | bwd_allreduce: 993.64 | step: 182.61 24%|██▍ | 295/1230 [5:47:10<17:58:31, 69.21s/it] {'loss': 1.182, 'learning_rate': 1.7779463233909677e-05, 'epoch': 0.24} 24%|██▍ | 295/1230 [5:47:10<17:58:31, 69.21s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 07:59:13,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.31 | bwd_microstep: 5208.50 | bwd_inner_microstep: 5189.36 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3581 [2024-07-31 07:59:22,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.09 | bwd_microstep: 4834.68 | bwd_inner_microstep: 4783.68 | bwd_allreduce_microstep: 50.92 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3576 [2024-07-31 07:59:30,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.02 | bwd_microstep: 5173.80 | bwd_inner_microstep: 5088.36 | bwd_allreduce_microstep: 85.37 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2081 [2024-07-31 07:59:39,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.74 | bwd_microstep: 5170.76 | bwd_inner_microstep: 4768.71 | bwd_allreduce_microstep: 401.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-31 07:59:47,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3321.94 | bwd_microstep: 4890.63 | bwd_inner_microstep: 4847.37 | bwd_allreduce_microstep: 43.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 07:59:55,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3028.76 | bwd_microstep: 4887.74 | bwd_inner_microstep: 4510.57 | bwd_allreduce_microstep: 377.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 08:00:04,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.14 | bwd_microstep: 5097.97 | bwd_inner_microstep: 5032.37 | bwd_allreduce_microstep: 65.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-31 08:00:13,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 08:00:13,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.66 | bwd_microstep: 4920.53 | bwd_inner_microstep: 4896.54 | bwd_allreduce_microstep: 23.92 | step_microstep: 181.66 [2024-07-31 08:00:13,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27682.55 | bwd: 40184.59 | bwd_inner: 39116.91 | bwd_allreduce: 1067.18 | step: 182.25 24%|██▍ | 296/1230 [5:48:19<17:52:38, 68.91s/it] {'loss': 1.1733, 'learning_rate': 1.7762890099805362e-05, 'epoch': 0.24} 24%|██▍ | 296/1230 [5:48:19<17:52:38, 68.91s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 08:00:22,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.58 | bwd_microstep: 5265.50 | bwd_inner_microstep: 5235.81 | bwd_allreduce_microstep: 29.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2344 [2024-07-31 08:00:30,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.23 | bwd_microstep: 5233.45 | bwd_inner_microstep: 4824.24 | bwd_allreduce_microstep: 409.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-07-31 08:00:39,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.96 | bwd_microstep: 5165.42 | bwd_inner_microstep: 5119.29 | bwd_allreduce_microstep: 46.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-31 08:00:48,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.50 | bwd_microstep: 5165.38 | bwd_inner_microstep: 5088.21 | bwd_allreduce_microstep: 77.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 08:00:57,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.50 | bwd_microstep: 4979.03 | bwd_inner_microstep: 4959.65 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 08:01:05,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.77 | bwd_microstep: 4908.71 | bwd_inner_microstep: 4889.33 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2156 [2024-07-31 08:01:13,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3019.68 | bwd_microstep: 4891.94 | bwd_inner_microstep: 4514.80 | bwd_allreduce_microstep: 377.07 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2121 [2024-07-31 08:01:22,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 08:01:22,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.74 | bwd_microstep: 5082.32 | bwd_inner_microstep: 4687.75 | bwd_allreduce_microstep: 394.49 | step_microstep: 181.88 [2024-07-31 08:01:22,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28431.86 | bwd: 40691.72 | bwd_inner: 39319.00 | bwd_allreduce: 1372.22 | step: 182.47 24%|██▍ | 297/1230 [5:49:28<17:54:03, 69.07s/it] {'loss': 1.2231, 'learning_rate': 1.774626313351145e-05, 'epoch': 0.24} 24%|██▍ | 297/1230 [5:49:28<17:54:03, 69.07s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3977 [2024-07-31 08:01:31,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.20 | bwd_microstep: 5460.73 | bwd_inner_microstep: 5400.46 | bwd_allreduce_microstep: 60.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3785 [2024-07-31 08:01:40,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.90 | bwd_microstep: 5319.23 | bwd_inner_microstep: 5251.25 | bwd_allreduce_microstep: 67.91 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-31 08:01:49,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.18 | bwd_microstep: 5056.42 | bwd_inner_microstep: 5031.84 | bwd_allreduce_microstep: 24.51 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2831 [2024-07-31 08:01:57,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3297.00 | bwd_microstep: 5035.96 | bwd_inner_microstep: 4656.91 | bwd_allreduce_microstep: 378.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 08:02:05,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3201.48 | bwd_microstep: 4760.17 | bwd_inner_microstep: 4729.17 | bwd_allreduce_microstep: 30.93 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-31 08:02:14,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.12 | bwd_microstep: 5048.46 | bwd_inner_microstep: 4657.71 | bwd_allreduce_microstep: 390.68 | step_microstep: 0.18 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3647 [2024-07-31 08:02:22,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3083.09 | bwd_microstep: 4801.11 | bwd_inner_microstep: 4762.63 | bwd_allreduce_microstep: 38.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-31 08:02:31,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 08:02:31,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.24 | bwd_microstep: 4984.90 | bwd_inner_microstep: 4965.50 | bwd_allreduce_microstep: 19.34 | step_microstep: 209.04 [2024-07-31 08:02:31,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27894.11 | bwd: 40466.97 | bwd_inner: 39455.42 | bwd_allreduce: 1011.06 | step: 209.73 24%|██▍ | 298/1230 [5:50:37<17:51:16, 68.97s/it] {'loss': 1.1535, 'learning_rate': 1.7729582450328547e-05, 'epoch': 0.24} 24%|██▍ | 298/1230 [5:50:37<17:51:16, 68.97s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3977 [2024-07-31 08:02:40,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3819.45 | bwd_microstep: 5266.44 | bwd_inner_microstep: 5247.35 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3954 [2024-07-31 08:02:49,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3658.17 | bwd_microstep: 5256.61 | bwd_inner_microstep: 5211.54 | bwd_allreduce_microstep: 45.00 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-31 08:02:58,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.66 | bwd_microstep: 5229.65 | bwd_inner_microstep: 5140.77 | bwd_allreduce_microstep: 88.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3756 [2024-07-31 08:03:07,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.83 | bwd_microstep: 5173.66 | bwd_inner_microstep: 5118.12 | bwd_allreduce_microstep: 55.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 08:03:16,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.75 | bwd_microstep: 5331.82 | bwd_inner_microstep: 5229.61 | bwd_allreduce_microstep: 102.14 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3777 [2024-07-31 08:03:24,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.05 | bwd_microstep: 4927.84 | bwd_inner_microstep: 4890.43 | bwd_allreduce_microstep: 37.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-31 08:03:33,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.89 | bwd_microstep: 5052.79 | bwd_inner_microstep: 5012.12 | bwd_allreduce_microstep: 40.60 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2161 [2024-07-31 08:03:42,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 08:03:42,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.43 | bwd_microstep: 5138.19 | bwd_inner_microstep: 4742.29 | bwd_allreduce_microstep: 395.83 | step_microstep: 183.00 [2024-07-31 08:03:42,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29104.13 | bwd: 41376.98 | bwd_inner: 40592.17 | bwd_allreduce: 784.32 | step: 183.59 24%|██▍ | 299/1230 [5:51:48<17:58:43, 69.52s/it] {'loss': 1.2462, 'learning_rate': 1.7712848165929776e-05, 'epoch': 0.24} 24%|██▍ | 299/1230 [5:51:48<17:58:43, 69.52s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-07-31 08:03:51,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.63 | bwd_microstep: 5272.55 | bwd_inner_microstep: 5208.91 | bwd_allreduce_microstep: 63.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3805 [2024-07-31 08:03:59,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.49 | bwd_microstep: 5188.29 | bwd_inner_microstep: 5134.93 | bwd_allreduce_microstep: 53.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3837 [2024-07-31 08:04:08,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.48 | bwd_microstep: 5153.71 | bwd_inner_microstep: 5107.45 | bwd_allreduce_microstep: 46.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 08:04:16,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3238.77 | bwd_microstep: 4882.44 | bwd_inner_microstep: 4836.60 | bwd_allreduce_microstep: 45.75 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3620 [2024-07-31 08:04:25,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.07 | bwd_microstep: 5040.35 | bwd_inner_microstep: 4963.60 | bwd_allreduce_microstep: 76.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 08:04:34,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.69 | bwd_microstep: 5080.31 | bwd_inner_microstep: 5018.84 | bwd_allreduce_microstep: 61.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 08:04:42,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.85 | bwd_microstep: 4901.46 | bwd_inner_microstep: 4879.08 | bwd_allreduce_microstep: 22.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-31 08:04:51,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 08:04:51,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.76 | bwd_microstep: 4944.64 | bwd_inner_microstep: 4915.08 | bwd_allreduce_microstep: 29.48 | step_microstep: 182.47 [2024-07-31 08:04:51,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28570.65 | bwd: 40463.69 | bwd_inner: 40064.41 | bwd_allreduce: 398.75 | step: 183.05 24%|██▍ | 300/1230 [5:52:57<17:56:51, 69.48s/it] {'loss': 1.2039, 'learning_rate': 1.7696060396359956e-05, 'epoch': 0.24} 24%|██▍ | 300/1230 [5:52:57<17:56:51, 69.48s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3883 [2024-07-31 08:05:00,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.75 | bwd_microstep: 5498.68 | bwd_inner_microstep: 5421.09 | bwd_allreduce_microstep: 77.52 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2301 [2024-07-31 08:05:09,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.49 | bwd_microstep: 5224.97 | bwd_inner_microstep: 4817.40 | bwd_allreduce_microstep: 407.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 08:05:17,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.50 | bwd_microstep: 4840.82 | bwd_inner_microstep: 4796.90 | bwd_allreduce_microstep: 43.85 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3741 [2024-07-31 08:05:25,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3124.48 | bwd_microstep: 4977.45 | bwd_inner_microstep: 4935.36 | bwd_allreduce_microstep: 42.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-31 08:05:34,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.07 | bwd_microstep: 5244.35 | bwd_inner_microstep: 4838.18 | bwd_allreduce_microstep: 406.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 08:05:42,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3200.45 | bwd_microstep: 4704.01 | bwd_inner_microstep: 4679.22 | bwd_allreduce_microstep: 24.73 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-31 08:05:50,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3014.52 | bwd_microstep: 4894.14 | bwd_inner_microstep: 4517.41 | bwd_allreduce_microstep: 376.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2153 [2024-07-31 08:05:59,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 08:05:59,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.65 | bwd_microstep: 5091.61 | bwd_inner_microstep: 4698.84 | bwd_allreduce_microstep: 392.71 | step_microstep: 182.07 [2024-07-31 08:05:59,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 26859.82 | bwd: 40476.03 | bwd_inner: 38704.34 | bwd_allreduce: 1771.20 | step: 182.67 24%|██▍ | 301/1230 [5:54:05<17:47:17, 68.93s/it] {'loss': 1.2023, 'learning_rate': 1.7679219258034798e-05, 'epoch': 0.24} 24%|██▍ | 301/1230 [5:54:05<17:47:17, 68.93s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2368 [2024-07-31 08:06:08,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.70 | bwd_microstep: 5242.10 | bwd_inner_microstep: 4839.19 | bwd_allreduce_microstep: 402.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3559 [2024-07-31 08:06:16,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.38 | bwd_microstep: 5100.03 | bwd_inner_microstep: 5021.97 | bwd_allreduce_microstep: 77.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-31 08:06:25,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.86 | bwd_microstep: 5106.64 | bwd_inner_microstep: 5038.66 | bwd_allreduce_microstep: 67.92 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-31 08:06:34,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.54 | bwd_microstep: 5197.72 | bwd_inner_microstep: 5112.61 | bwd_allreduce_microstep: 85.03 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2065 [2024-07-31 08:06:42,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3460.04 | bwd_microstep: 5018.68 | bwd_inner_microstep: 4628.46 | bwd_allreduce_microstep: 390.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 08:06:51,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.74 | bwd_microstep: 4955.20 | bwd_inner_microstep: 4908.39 | bwd_allreduce_microstep: 46.74 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2847 [2024-07-31 08:06:59,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.85 | bwd_microstep: 5072.00 | bwd_inner_microstep: 4673.88 | bwd_allreduce_microstep: 398.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-31 08:07:07,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.43 [2024-07-31 08:07:07,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3002.00 | bwd_microstep: 4866.16 | bwd_inner_microstep: 4489.65 | bwd_allreduce_microstep: 376.44 | step_microstep: 181.72 [2024-07-31 08:07:07,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27770.01 | bwd: 40558.51 | bwd_inner: 38712.75 | bwd_allreduce: 1845.26 | step: 182.39 25%|██▍ | 302/1230 [5:55:13<17:44:54, 68.85s/it] {'loss': 1.2235, 'learning_rate': 1.7662324867740102e-05, 'epoch': 0.25} 25%|██▍ | 302/1230 [5:55:13<17:44:54, 68.85s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 08:07:17,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3896.05 | bwd_microstep: 5406.90 | bwd_inner_microstep: 5387.86 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2287 [2024-07-31 08:07:25,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.70 | bwd_microstep: 5212.05 | bwd_inner_microstep: 4807.19 | bwd_allreduce_microstep: 404.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3583 [2024-07-31 08:07:34,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.52 | bwd_microstep: 5184.54 | bwd_inner_microstep: 5096.03 | bwd_allreduce_microstep: 88.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3660 [2024-07-31 08:07:43,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.41 | bwd_microstep: 4866.74 | bwd_inner_microstep: 4847.39 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 08:07:52,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.41 | bwd_microstep: 5002.72 | bwd_inner_microstep: 4980.73 | bwd_allreduce_microstep: 21.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-31 08:08:00,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.70 | bwd_microstep: 4989.13 | bwd_inner_microstep: 4969.73 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 08:08:09,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.69 | bwd_microstep: 4930.31 | bwd_inner_microstep: 4905.86 | bwd_allreduce_microstep: 24.38 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2144 [2024-07-31 08:08:18,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 08:08:18,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3453.72 | bwd_microstep: 5040.44 | bwd_inner_microstep: 4650.42 | bwd_allreduce_microstep: 389.95 | step_microstep: 181.14 [2024-07-31 08:08:18,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29363.10 | bwd: 40632.80 | bwd_inner: 39645.15 | bwd_allreduce: 987.16 | step: 181.72 25%|██▍ | 303/1230 [5:56:24<17:50:35, 69.29s/it] {'loss': 1.1846, 'learning_rate': 1.7645377342630956e-05, 'epoch': 0.25} 25%|██▍ | 303/1230 [5:56:24<17:50:35, 69.29s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4012 [2024-07-31 08:08:27,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.02 | bwd_microstep: 5117.40 | bwd_inner_microstep: 5096.87 | bwd_allreduce_microstep: 20.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3915 [2024-07-31 08:08:36,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3824.95 | bwd_microstep: 5285.46 | bwd_inner_microstep: 5247.67 | bwd_allreduce_microstep: 37.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3785 [2024-07-31 08:08:45,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.40 | bwd_microstep: 5057.63 | bwd_inner_microstep: 5034.39 | bwd_allreduce_microstep: 23.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 08:08:53,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.80 | bwd_microstep: 4988.13 | bwd_inner_microstep: 4968.47 | bwd_allreduce_microstep: 19.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 08:09:02,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.90 | bwd_microstep: 5206.16 | bwd_inner_microstep: 5125.16 | bwd_allreduce_microstep: 80.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-31 08:09:11,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.70 | bwd_microstep: 5187.31 | bwd_inner_microstep: 4783.09 | bwd_allreduce_microstep: 404.15 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2116 [2024-07-31 08:09:20,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.88 | bwd_microstep: 5239.17 | bwd_inner_microstep: 4832.04 | bwd_allreduce_microstep: 407.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 08:09:28,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 08:09:28,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.15 | bwd_microstep: 4889.40 | bwd_inner_microstep: 4869.96 | bwd_allreduce_microstep: 19.37 | step_microstep: 182.08 [2024-07-31 08:09:28,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29453.69 | bwd: 40970.65 | bwd_inner: 39957.61 | bwd_allreduce: 1012.55 | step: 182.65 25%|██▍ | 304/1230 [5:57:34<17:56:13, 69.73s/it] {'loss': 1.2434, 'learning_rate': 1.76283768002309e-05, 'epoch': 0.25} 25%|██▍ | 304/1230 [5:57:34<17:56:13, 69.73s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3864 [2024-07-31 08:09:38,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.64 | bwd_microstep: 5602.55 | bwd_inner_microstep: 5485.73 | bwd_allreduce_microstep: 116.75 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3556 [2024-07-31 08:09:46,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3108.32 | bwd_microstep: 4963.44 | bwd_inner_microstep: 4884.46 | bwd_allreduce_microstep: 78.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 08:09:55,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.63 | bwd_microstep: 5165.07 | bwd_inner_microstep: 5109.70 | bwd_allreduce_microstep: 55.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 08:10:03,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.53 | bwd_microstep: 4956.58 | bwd_inner_microstep: 4927.94 | bwd_allreduce_microstep: 28.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-31 08:10:11,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3040.28 | bwd_microstep: 4982.68 | bwd_inner_microstep: 4599.17 | bwd_allreduce_microstep: 383.43 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-31 08:10:20,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.90 | bwd_microstep: 5089.37 | bwd_inner_microstep: 5043.43 | bwd_allreduce_microstep: 45.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 08:10:29,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.59 | bwd_microstep: 5073.28 | bwd_inner_microstep: 4682.04 | bwd_allreduce_microstep: 391.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-31 08:10:38,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 08:10:38,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.70 | bwd_microstep: 5055.41 | bwd_inner_microstep: 4989.12 | bwd_allreduce_microstep: 66.22 | step_microstep: 182.01 [2024-07-31 08:10:38,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28088.49 | bwd: 40888.35 | bwd_inner: 39721.54 | bwd_allreduce: 1166.33 | step: 182.60 25%|██▍ | 305/1230 [5:58:44<17:53:06, 69.61s/it] {'loss': 1.2344, 'learning_rate': 1.7611323358431145e-05, 'epoch': 0.25} 25%|██▍ | 305/1230 [5:58:44<17:53:06, 69.61s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3959 [2024-07-31 08:10:47,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.12 | bwd_microstep: 5267.43 | bwd_inner_microstep: 5223.04 | bwd_allreduce_microstep: 44.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3864 [2024-07-31 08:10:56,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.58 | bwd_microstep: 5135.95 | bwd_inner_microstep: 5092.02 | bwd_allreduce_microstep: 43.85 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2864 [2024-07-31 08:11:04,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.83 | bwd_microstep: 5216.35 | bwd_inner_microstep: 4809.91 | bwd_allreduce_microstep: 406.38 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3770 [2024-07-31 08:11:13,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.67 | bwd_microstep: 5211.10 | bwd_inner_microstep: 5118.49 | bwd_allreduce_microstep: 92.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 08:11:22,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.86 | bwd_microstep: 4998.06 | bwd_inner_microstep: 4976.89 | bwd_allreduce_microstep: 21.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 08:11:30,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3014.04 | bwd_microstep: 4939.77 | bwd_inner_microstep: 4558.72 | bwd_allreduce_microstep: 380.99 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3647 [2024-07-31 08:11:39,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.05 | bwd_microstep: 5061.48 | bwd_inner_microstep: 4974.10 | bwd_allreduce_microstep: 87.31 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2119 [2024-07-31 08:11:47,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 08:11:47,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.11 | bwd_microstep: 5127.53 | bwd_inner_microstep: 4731.48 | bwd_allreduce_microstep: 395.98 | step_microstep: 182.68 [2024-07-31 08:11:47,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28382.18 | bwd: 40957.64 | bwd_inner: 39484.58 | bwd_allreduce: 1472.58 | step: 183.25 25%|██▍ | 306/1230 [5:59:53<17:52:14, 69.63s/it] {'loss': 1.2312, 'learning_rate': 1.759421713548971e-05, 'epoch': 0.25} 25%|██▍ | 306/1230 [5:59:53<17:52:14, 69.63s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4063 [2024-07-31 08:11:57,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3856.53 | bwd_microstep: 5355.13 | bwd_inner_microstep: 5336.04 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 08:12:06,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.93 | bwd_microstep: 5119.19 | bwd_inner_microstep: 5088.87 | bwd_allreduce_microstep: 30.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-31 08:12:14,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.61 | bwd_microstep: 5223.09 | bwd_inner_microstep: 5166.18 | bwd_allreduce_microstep: 56.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3726 [2024-07-31 08:12:23,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3119.49 | bwd_microstep: 4984.91 | bwd_inner_microstep: 4938.46 | bwd_allreduce_microstep: 46.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-31 08:12:31,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.53 | bwd_microstep: 5069.19 | bwd_inner_microstep: 5027.64 | bwd_allreduce_microstep: 41.47 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-31 08:12:40,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.66 | bwd_microstep: 4939.20 | bwd_inner_microstep: 4907.73 | bwd_allreduce_microstep: 31.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3707 [2024-07-31 08:12:48,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.02 | bwd_microstep: 5047.01 | bwd_inner_microstep: 4979.07 | bwd_allreduce_microstep: 67.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 08:12:57,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 08:12:57,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.68 | bwd_microstep: 5061.27 | bwd_inner_microstep: 5001.36 | bwd_allreduce_microstep: 59.84 | step_microstep: 182.85 [2024-07-31 08:12:57,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28647.34 | bwd: 40798.96 | bwd_inner: 40445.26 | bwd_allreduce: 353.20 | step: 183.44 25%|██▍ | 307/1230 [6:01:03<17:51:48, 69.67s/it] {'loss': 1.1927, 'learning_rate': 1.7577058250030655e-05, 'epoch': 0.25} 25%|██▍ | 307/1230 [6:01:03<17:51:48, 69.67s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3844 [2024-07-31 08:13:06,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3839.64 | bwd_microstep: 5210.73 | bwd_inner_microstep: 5180.21 | bwd_allreduce_microstep: 30.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2361 [2024-07-31 08:13:15,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.62 | bwd_microstep: 5258.70 | bwd_inner_microstep: 4849.04 | bwd_allreduce_microstep: 409.59 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2057 [2024-07-31 08:13:24,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.33 | bwd_microstep: 5250.04 | bwd_inner_microstep: 4843.55 | bwd_allreduce_microstep: 406.43 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3763 [2024-07-31 08:13:32,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3115.80 | bwd_microstep: 4894.81 | bwd_inner_microstep: 4862.54 | bwd_allreduce_microstep: 32.20 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2083 [2024-07-31 08:13:41,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.93 | bwd_microstep: 5207.65 | bwd_inner_microstep: 4802.37 | bwd_allreduce_microstep: 405.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-31 08:13:49,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3464.88 | bwd_microstep: 5025.19 | bwd_inner_microstep: 4634.15 | bwd_allreduce_microstep: 390.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3646 [2024-07-31 08:13:58,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.13 | bwd_microstep: 5046.12 | bwd_inner_microstep: 4967.89 | bwd_allreduce_microstep: 78.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 08:14:07,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 08:14:07,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.06 | bwd_microstep: 4983.30 | bwd_inner_microstep: 4932.85 | bwd_allreduce_microstep: 50.37 | step_microstep: 181.94 [2024-07-31 08:14:07,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28169.30 | bwd: 40876.51 | bwd_inner: 39072.56 | bwd_allreduce: 1803.47 | step: 182.51 25%|██▌ | 308/1230 [6:02:12<17:49:15, 69.58s/it] {'loss': 1.2007, 'learning_rate': 1.7559846821043205e-05, 'epoch': 0.25} 25%|██▌ | 308/1230 [6:02:12<17:49:15, 69.58s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2438 [2024-07-31 08:14:15,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3174.54 | bwd_microstep: 5416.73 | bwd_inner_microstep: 5003.34 | bwd_allreduce_microstep: 413.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3818 [2024-07-31 08:14:24,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.70 | bwd_microstep: 5267.06 | bwd_inner_microstep: 5203.89 | bwd_allreduce_microstep: 63.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 08:14:32,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3206.15 | bwd_microstep: 4839.84 | bwd_inner_microstep: 4797.19 | bwd_allreduce_microstep: 42.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 08:14:41,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.91 | bwd_microstep: 5001.23 | bwd_inner_microstep: 4978.79 | bwd_allreduce_microstep: 22.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 08:14:50,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.21 | bwd_microstep: 5041.64 | bwd_inner_microstep: 5014.94 | bwd_allreduce_microstep: 26.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 08:14:59,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.40 | bwd_microstep: 4995.19 | bwd_inner_microstep: 4975.77 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2235 [2024-07-31 08:15:07,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.10 | bwd_microstep: 5104.63 | bwd_inner_microstep: 4707.63 | bwd_allreduce_microstep: 396.94 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3848 [2024-07-31 08:15:16,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-31 08:15:16,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.55 | bwd_microstep: 5015.99 | bwd_inner_microstep: 4996.61 | bwd_allreduce_microstep: 19.30 | step_microstep: 181.56 [2024-07-31 08:15:16,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28500.45 | bwd: 40682.29 | bwd_inner: 39678.10 | bwd_allreduce: 1003.70 | step: 182.14 25%|██▌ | 309/1230 [6:03:22<17:47:47, 69.56s/it] {'loss': 1.2099, 'learning_rate': 1.754258296788097e-05, 'epoch': 0.25} 25%|██▌ | 309/1230 [6:03:22<17:47:47, 69.56s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2394 [2024-07-31 08:15:25,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.50 | bwd_microstep: 5447.44 | bwd_inner_microstep: 5027.09 | bwd_allreduce_microstep: 420.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3711 [2024-07-31 08:15:34,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.74 | bwd_microstep: 5231.98 | bwd_inner_microstep: 5138.74 | bwd_allreduce_microstep: 93.17 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2049 [2024-07-31 08:15:43,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.53 | bwd_microstep: 5207.82 | bwd_inner_microstep: 4803.70 | bwd_allreduce_microstep: 404.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2247 [2024-07-31 08:15:52,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.97 | bwd_microstep: 5200.42 | bwd_inner_microstep: 4794.82 | bwd_allreduce_microstep: 405.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-31 08:16:00,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.20 | bwd_microstep: 5108.12 | bwd_inner_microstep: 5055.47 | bwd_allreduce_microstep: 52.59 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 08:16:09,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.18 | bwd_microstep: 5185.32 | bwd_inner_microstep: 5105.25 | bwd_allreduce_microstep: 80.00 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 08:16:18,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.02 | bwd_microstep: 4986.65 | bwd_inner_microstep: 4967.21 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 08:16:26,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 08:16:26,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.57 | bwd_microstep: 4779.14 | bwd_inner_microstep: 4744.75 | bwd_allreduce_microstep: 34.32 | step_microstep: 182.13 [2024-07-31 08:16:26,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28468.62 | bwd: 41146.88 | bwd_inner: 39636.99 | bwd_allreduce: 1509.40 | step: 182.70 25%|██▌ | 310/1230 [6:04:32<17:48:24, 69.68s/it] {'loss': 1.2793, 'learning_rate': 1.7525266810261096e-05, 'epoch': 0.25} 25%|██▌ | 310/1230 [6:04:32<17:48:24, 69.68s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2233 [2024-07-31 08:16:35,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.66 | bwd_microstep: 5364.25 | bwd_inner_microstep: 4951.03 | bwd_allreduce_microstep: 413.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3818 [2024-07-31 08:16:43,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3375.98 | bwd_microstep: 5020.10 | bwd_inner_microstep: 4991.31 | bwd_allreduce_microstep: 28.73 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3774 [2024-07-31 08:16:52,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3168.98 | bwd_microstep: 4930.75 | bwd_inner_microstep: 4899.38 | bwd_allreduce_microstep: 31.30 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1120 [2024-07-31 08:17:00,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.35 | bwd_microstep: 5276.38 | bwd_inner_microstep: 4871.58 | bwd_allreduce_microstep: 404.74 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2072 [2024-07-31 08:17:09,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.13 | bwd_microstep: 5111.49 | bwd_inner_microstep: 4713.00 | bwd_allreduce_microstep: 398.42 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 08:17:18,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.31 | bwd_microstep: 5008.86 | bwd_inner_microstep: 4955.71 | bwd_allreduce_microstep: 53.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 08:17:26,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.61 | bwd_microstep: 4909.48 | bwd_inner_microstep: 4890.02 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3670 [2024-07-31 08:17:35,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 08:17:35,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.40 | bwd_microstep: 4934.80 | bwd_inner_microstep: 4908.77 | bwd_allreduce_microstep: 25.96 | step_microstep: 181.32 [2024-07-31 08:17:35,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28135.31 | bwd: 40556.11 | bwd_inner: 39180.74 | bwd_allreduce: 1374.88 | step: 181.91 25%|██▌ | 311/1230 [6:05:41<17:44:12, 69.48s/it] {'loss': 1.2342, 'learning_rate': 1.7507898468263422e-05, 'epoch': 0.25} 25%|██▌ | 311/1230 [6:05:41<17:44:12, 69.48s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3814 [2024-07-31 08:17:44,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.57 | bwd_microstep: 5142.19 | bwd_inner_microstep: 5102.43 | bwd_allreduce_microstep: 39.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 08:17:53,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.31 | bwd_microstep: 4995.79 | bwd_inner_microstep: 4974.18 | bwd_allreduce_microstep: 21.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-31 08:18:01,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.24 | bwd_microstep: 4997.75 | bwd_inner_microstep: 4978.38 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3616 [2024-07-31 08:18:10,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3114.83 | bwd_microstep: 5014.48 | bwd_inner_microstep: 4938.43 | bwd_allreduce_microstep: 75.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 08:18:18,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3435.99 | bwd_microstep: 5006.41 | bwd_inner_microstep: 4619.47 | bwd_allreduce_microstep: 386.88 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2142 [2024-07-31 08:18:27,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.59 | bwd_microstep: 5124.49 | bwd_inner_microstep: 4727.15 | bwd_allreduce_microstep: 397.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 08:18:35,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3196.88 | bwd_microstep: 4684.67 | bwd_inner_microstep: 4660.54 | bwd_allreduce_microstep: 24.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 08:18:43,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 08:18:43,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.41 | bwd_microstep: 5014.93 | bwd_inner_microstep: 4962.23 | bwd_allreduce_microstep: 52.62 | step_microstep: 182.76 [2024-07-31 08:18:43,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27947.73 | bwd: 39980.71 | bwd_inner: 38962.75 | bwd_allreduce: 1017.46 | step: 183.33 25%|██▌ | 312/1230 [6:06:49<17:37:27, 69.12s/it] {'loss': 1.2643, 'learning_rate': 1.7490478062329686e-05, 'epoch': 0.25} 25%|██▌ | 312/1230 [6:06:49<17:37:27, 69.12s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3969 [2024-07-31 08:18:52,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3343.28 | bwd_microstep: 5045.30 | bwd_inner_microstep: 5026.25 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 08:19:00,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.99 | bwd_microstep: 4983.02 | bwd_inner_microstep: 4963.71 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2232 [2024-07-31 08:19:09,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3063.13 | bwd_microstep: 5023.18 | bwd_inner_microstep: 4634.95 | bwd_allreduce_microstep: 388.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-31 08:19:17,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3193.32 | bwd_microstep: 4780.43 | bwd_inner_microstep: 4744.04 | bwd_allreduce_microstep: 36.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 08:19:25,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.84 | bwd_microstep: 5111.29 | bwd_inner_microstep: 4715.01 | bwd_allreduce_microstep: 396.20 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3705 [2024-07-31 08:19:34,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.38 | bwd_microstep: 5172.03 | bwd_inner_microstep: 5113.51 | bwd_allreduce_microstep: 58.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 08:19:43,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.60 | bwd_microstep: 4905.07 | bwd_inner_microstep: 4885.71 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 08:19:51,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 08:19:51,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.79 | bwd_microstep: 4887.71 | bwd_inner_microstep: 4868.26 | bwd_allreduce_microstep: 19.37 | step_microstep: 181.84 [2024-07-31 08:19:51,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27879.23 | bwd: 39908.02 | bwd_inner: 38951.39 | bwd_allreduce: 956.13 | step: 182.43 25%|██▌ | 313/1230 [6:07:57<17:31:44, 68.82s/it] {'loss': 1.2237, 'learning_rate': 1.7473005713262644e-05, 'epoch': 0.25} 25%|██▌ | 313/1230 [6:07:57<17:31:44, 68.82s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2471 [2024-07-31 08:20:00,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3351.72 | bwd_microstep: 5437.39 | bwd_inner_microstep: 5022.42 | bwd_allreduce_microstep: 414.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3789 [2024-07-31 08:20:09,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.84 | bwd_microstep: 5027.49 | bwd_inner_microstep: 5006.55 | bwd_allreduce_microstep: 20.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 08:20:17,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3348.58 | bwd_microstep: 4962.43 | bwd_inner_microstep: 4914.39 | bwd_allreduce_microstep: 47.96 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3770 [2024-07-31 08:20:26,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.97 | bwd_microstep: 5009.54 | bwd_inner_microstep: 4963.10 | bwd_allreduce_microstep: 46.38 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-31 08:20:35,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.61 | bwd_microstep: 5078.59 | bwd_inner_microstep: 5011.18 | bwd_allreduce_microstep: 67.34 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2137 [2024-07-31 08:20:43,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.32 | bwd_microstep: 5132.44 | bwd_inner_microstep: 4735.57 | bwd_allreduce_microstep: 396.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3662 [2024-07-31 08:20:52,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3346.15 | bwd_microstep: 4947.28 | bwd_inner_microstep: 4889.53 | bwd_allreduce_microstep: 57.67 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 08:21:00,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.77 [2024-07-31 08:21:00,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.11 | bwd_microstep: 4879.92 | bwd_inner_microstep: 4860.51 | bwd_allreduce_microstep: 19.34 | step_microstep: 182.16 [2024-07-31 08:21:00,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28166.19 | bwd: 40475.07 | bwd_inner: 39403.19 | bwd_allreduce: 1071.36 | step: 182.85 26%|██▌ | 314/1230 [6:09:06<17:31:20, 68.86s/it] {'loss': 1.2549, 'learning_rate': 1.745548154222527e-05, 'epoch': 0.26} 26%|██▌ | 314/1230 [6:09:06<17:31:20, 68.86s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3970 [2024-07-31 08:21:09,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.40 | bwd_microstep: 5236.99 | bwd_inner_microstep: 5206.70 | bwd_allreduce_microstep: 30.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3873 [2024-07-31 08:21:18,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3791.41 | bwd_microstep: 5133.40 | bwd_inner_microstep: 5112.38 | bwd_allreduce_microstep: 20.96 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3625 [2024-07-31 08:21:27,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.26 | bwd_microstep: 5113.54 | bwd_inner_microstep: 5025.06 | bwd_allreduce_microstep: 88.42 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-31 08:21:35,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3235.52 | bwd_microstep: 4892.49 | bwd_inner_microstep: 4845.94 | bwd_allreduce_microstep: 46.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 08:21:44,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.30 | bwd_microstep: 5178.14 | bwd_inner_microstep: 5106.19 | bwd_allreduce_microstep: 71.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3905 [2024-07-31 08:21:53,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.38 | bwd_microstep: 5104.73 | bwd_inner_microstep: 5067.78 | bwd_allreduce_microstep: 36.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2239 [2024-07-31 08:22:01,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3446.87 | bwd_microstep: 5032.93 | bwd_inner_microstep: 4644.58 | bwd_allreduce_microstep: 388.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 08:22:10,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 08:22:10,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.80 | bwd_microstep: 5097.80 | bwd_inner_microstep: 5033.35 | bwd_allreduce_microstep: 64.37 | step_microstep: 181.46 [2024-07-31 08:22:10,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28620.85 | bwd: 40790.00 | bwd_inner: 40041.92 | bwd_allreduce: 747.60 | step: 182.06 26%|██▌ | 315/1230 [6:10:16<17:34:13, 69.13s/it] {'loss': 1.2176, 'learning_rate': 1.7437905670739893e-05, 'epoch': 0.26} 26%|██▌ | 315/1230 [6:10:16<17:34:13, 69.13s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4096 [2024-07-31 08:22:19,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.54 | bwd_microstep: 5495.76 | bwd_inner_microstep: 5476.69 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3915 [2024-07-31 08:22:28,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3805.37 | bwd_microstep: 5180.04 | bwd_inner_microstep: 5157.06 | bwd_allreduce_microstep: 22.91 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1779 [2024-07-31 08:22:37,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.16 | bwd_microstep: 5203.26 | bwd_inner_microstep: 4799.12 | bwd_allreduce_microstep: 404.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3948 [2024-07-31 08:22:46,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3805.56 | bwd_microstep: 5177.55 | bwd_inner_microstep: 5158.26 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3756 [2024-07-31 08:22:55,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.61 | bwd_microstep: 5192.79 | bwd_inner_microstep: 5135.37 | bwd_allreduce_microstep: 57.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-31 08:23:04,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.03 | bwd_microstep: 5029.74 | bwd_inner_microstep: 5005.02 | bwd_allreduce_microstep: 24.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3792 [2024-07-31 08:23:13,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.35 | bwd_microstep: 5156.50 | bwd_inner_microstep: 5107.78 | bwd_allreduce_microstep: 48.66 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-31 08:23:22,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.78 [2024-07-31 08:23:22,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.83 | bwd_microstep: 5001.90 | bwd_inner_microstep: 4982.48 | bwd_allreduce_microstep: 19.35 | step_microstep: 182.71 [2024-07-31 08:23:22,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29647.35 | bwd: 41437.52 | bwd_inner: 40821.73 | bwd_allreduce: 615.30 | step: 183.29 26%|██▌ | 316/1230 [6:11:27<17:43:33, 69.82s/it] {'loss': 1.171, 'learning_rate': 1.7420278220687366e-05, 'epoch': 0.26} 26%|██▌ | 316/1230 [6:11:27<17:43:33, 69.82s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4073 [2024-07-31 08:23:31,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.94 | bwd_microstep: 5234.09 | bwd_inner_microstep: 5215.02 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3832 [2024-07-31 08:23:40,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.44 | bwd_microstep: 5332.75 | bwd_inner_microstep: 5279.96 | bwd_allreduce_microstep: 52.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 08:23:48,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.59 | bwd_microstep: 5127.91 | bwd_inner_microstep: 5059.92 | bwd_allreduce_microstep: 67.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3088 [2024-07-31 08:23:57,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.11 | bwd_microstep: 5244.65 | bwd_inner_microstep: 4945.28 | bwd_allreduce_microstep: 299.30 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 08:24:05,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3236.62 | bwd_microstep: 4773.76 | bwd_inner_microstep: 4740.84 | bwd_allreduce_microstep: 32.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-31 08:24:14,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3807.23 | bwd_microstep: 5179.19 | bwd_inner_microstep: 5137.72 | bwd_allreduce_microstep: 41.40 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2123 [2024-07-31 08:24:23,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.49 | bwd_microstep: 5149.67 | bwd_inner_microstep: 4749.35 | bwd_allreduce_microstep: 400.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2142 [2024-07-31 08:24:32,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 08:24:32,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.71 | bwd_microstep: 5235.01 | bwd_inner_microstep: 4828.20 | bwd_allreduce_microstep: 406.73 | step_microstep: 181.62 [2024-07-31 08:24:32,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28694.03 | bwd: 41277.01 | bwd_inner: 39956.23 | bwd_allreduce: 1320.28 | step: 182.22 26%|██▌ | 317/1230 [6:12:38<17:44:36, 69.96s/it] {'loss': 1.2079, 'learning_rate': 1.7402599314306207e-05, 'epoch': 0.26} 26%|██▌ | 317/1230 [6:12:38<17:44:36, 69.96s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3659 [2024-07-31 08:24:41,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.57 | bwd_microstep: 5346.69 | bwd_inner_microstep: 5229.88 | bwd_allreduce_microstep: 116.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3882 [2024-07-31 08:24:50,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.75 | bwd_microstep: 5157.52 | bwd_inner_microstep: 5116.26 | bwd_allreduce_microstep: 41.19 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3838 [2024-07-31 08:24:59,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.85 | bwd_microstep: 5144.04 | bwd_inner_microstep: 5109.19 | bwd_allreduce_microstep: 34.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-31 08:25:07,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.24 | bwd_microstep: 5012.09 | bwd_inner_microstep: 4992.62 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 08:25:16,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.76 | bwd_microstep: 5038.71 | bwd_inner_microstep: 5012.40 | bwd_allreduce_microstep: 26.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 08:25:25,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.88 | bwd_microstep: 5091.47 | bwd_inner_microstep: 5044.03 | bwd_allreduce_microstep: 47.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 08:25:34,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.59 | bwd_microstep: 5136.24 | bwd_inner_microstep: 5088.36 | bwd_allreduce_microstep: 47.81 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3679 [2024-07-31 08:25:43,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-31 08:25:43,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.56 | bwd_microstep: 5141.62 | bwd_inner_microstep: 5052.37 | bwd_allreduce_microstep: 89.18 | step_microstep: 181.46 [2024-07-31 08:25:43,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29254.10 | bwd: 41068.37 | bwd_inner: 40645.05 | bwd_allreduce: 422.82 | step: 182.04 26%|██▌ | 318/1230 [6:13:48<17:46:35, 70.17s/it] {'loss': 1.2201, 'learning_rate': 1.7384869074191777e-05, 'epoch': 0.26} 26%|██▌ | 318/1230 [6:13:48<17:46:35, 70.17s/it]dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 959 [2024-07-31 08:25:51,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.93 | bwd_microstep: 5343.08 | bwd_inner_microstep: 4930.32 | bwd_allreduce_microstep: 412.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4012 [2024-07-31 08:26:01,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3847.54 | bwd_microstep: 5249.15 | bwd_inner_microstep: 5229.81 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-31 08:26:09,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3800.46 | bwd_microstep: 5126.99 | bwd_inner_microstep: 5094.11 | bwd_allreduce_microstep: 32.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 08:26:18,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.82 | bwd_microstep: 4989.29 | bwd_inner_microstep: 4969.98 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-31 08:26:26,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3271.53 | bwd_microstep: 4924.88 | bwd_inner_microstep: 4894.67 | bwd_allreduce_microstep: 30.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 08:26:34,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3195.14 | bwd_microstep: 4694.76 | bwd_inner_microstep: 4675.40 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-31 08:26:43,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.04 | bwd_microstep: 5056.15 | bwd_inner_microstep: 4664.31 | bwd_allreduce_microstep: 391.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 08:26:52,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 08:26:52,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.46 | bwd_microstep: 5018.48 | bwd_inner_microstep: 4969.22 | bwd_allreduce_microstep: 49.19 | step_microstep: 182.53 [2024-07-31 08:26:52,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28403.80 | bwd: 40402.75 | bwd_inner: 39427.77 | bwd_allreduce: 974.48 | step: 183.11 26%|██▌ | 319/1230 [6:14:58<17:40:42, 69.86s/it] {'loss': 1.2257, 'learning_rate': 1.7367087623295394e-05, 'epoch': 0.26} 26%|██▌ | 319/1230 [6:14:58<17:40:42, 69.86s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3996 [2024-07-31 08:27:01,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3886.42 | bwd_microstep: 5335.74 | bwd_inner_microstep: 5306.30 | bwd_allreduce_microstep: 29.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 08:27:10,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.09 | bwd_microstep: 5318.05 | bwd_inner_microstep: 5220.09 | bwd_allreduce_microstep: 97.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3898 [2024-07-31 08:27:19,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.31 | bwd_microstep: 5239.30 | bwd_inner_microstep: 5188.65 | bwd_allreduce_microstep: 50.58 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 08:27:28,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.93 | bwd_microstep: 5029.42 | bwd_inner_microstep: 5005.76 | bwd_allreduce_microstep: 23.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 08:27:36,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.99 | bwd_microstep: 5049.14 | bwd_inner_microstep: 5022.59 | bwd_allreduce_microstep: 26.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-31 08:27:44,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3209.34 | bwd_microstep: 4746.06 | bwd_inner_microstep: 4712.16 | bwd_allreduce_microstep: 33.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-31 08:27:52,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.40 | bwd_microstep: 4846.14 | bwd_inner_microstep: 4804.83 | bwd_allreduce_microstep: 41.23 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-31 08:28:01,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 08:28:01,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.69 | bwd_microstep: 5049.43 | bwd_inner_microstep: 5005.82 | bwd_allreduce_microstep: 43.55 | step_microstep: 181.83 [2024-07-31 08:28:01,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28710.09 | bwd: 40613.26 | bwd_inner: 40266.15 | bwd_allreduce: 346.61 | step: 182.43 26%|██▌ | 320/1230 [6:16:07<17:38:38, 69.80s/it] {'loss': 1.1221, 'learning_rate': 1.7349255084923517e-05, 'epoch': 0.26} 26%|██▌ | 320/1230 [6:16:07<17:38:38, 69.80s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4086 [2024-07-31 08:28:11,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.23 | bwd_microstep: 5496.11 | bwd_inner_microstep: 5451.09 | bwd_allreduce_microstep: 44.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 08:28:19,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3219.26 | bwd_microstep: 4824.15 | bwd_inner_microstep: 4798.52 | bwd_allreduce_microstep: 25.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3879 [2024-07-31 08:28:27,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.09 | bwd_microstep: 5114.74 | bwd_inner_microstep: 5077.21 | bwd_allreduce_microstep: 37.46 | step_microstep: 0.18 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3727 [2024-07-31 08:28:36,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.23 | bwd_microstep: 5169.68 | bwd_inner_microstep: 5096.27 | bwd_allreduce_microstep: 73.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 08:28:45,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.93 | bwd_microstep: 4938.38 | bwd_inner_microstep: 4911.18 | bwd_allreduce_microstep: 27.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 08:28:53,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3328.95 | bwd_microstep: 5017.88 | bwd_inner_microstep: 4961.49 | bwd_allreduce_microstep: 56.32 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3025 [2024-07-31 08:29:02,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.80 | bwd_microstep: 5072.27 | bwd_inner_microstep: 4788.81 | bwd_allreduce_microstep: 283.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2238 [2024-07-31 08:29:10,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 08:29:10,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3000.34 | bwd_microstep: 4865.90 | bwd_inner_microstep: 4491.89 | bwd_allreduce_microstep: 373.95 | step_microstep: 180.92 [2024-07-31 08:29:10,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27784.74 | bwd: 40499.09 | bwd_inner: 39576.39 | bwd_allreduce: 922.20 | step: 181.61 26%|██▌ | 321/1230 [6:17:16<17:32:05, 69.45s/it] {'loss': 1.1572, 'learning_rate': 1.7331371582736864e-05, 'epoch': 0.26} 26%|██▌ | 321/1230 [6:17:16<17:32:05, 69.45s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3941 [2024-07-31 08:29:19,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3830.31 | bwd_microstep: 5200.65 | bwd_inner_microstep: 5180.00 | bwd_allreduce_microstep: 20.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3823 [2024-07-31 08:29:27,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3288.46 | bwd_microstep: 5027.86 | bwd_inner_microstep: 4987.69 | bwd_allreduce_microstep: 40.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2198 [2024-07-31 08:29:36,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.86 | bwd_microstep: 5494.92 | bwd_inner_microstep: 5070.78 | bwd_allreduce_microstep: 424.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 08:29:45,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.78 | bwd_microstep: 4822.98 | bwd_inner_microstep: 4780.26 | bwd_allreduce_microstep: 42.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 08:29:53,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.65 | bwd_microstep: 5148.89 | bwd_inner_microstep: 4749.86 | bwd_allreduce_microstep: 398.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 08:30:02,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.48 | bwd_microstep: 5016.35 | bwd_inner_microstep: 4961.93 | bwd_allreduce_microstep: 54.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 08:30:10,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.65 | bwd_microstep: 5025.16 | bwd_inner_microstep: 4969.86 | bwd_allreduce_microstep: 55.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 08:30:19,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.74 [2024-07-31 08:30:19,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.93 | bwd_microstep: 5009.63 | bwd_inner_microstep: 4960.51 | bwd_allreduce_microstep: 49.06 | step_microstep: 182.31 [2024-07-31 08:30:19,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28109.05 | bwd: 40746.42 | bwd_inner: 39660.84 | bwd_allreduce: 1085.09 | step: 182.89 26%|██▌ | 322/1230 [6:18:25<17:29:45, 69.37s/it] {'loss': 1.2267, 'learning_rate': 1.731343724074957e-05, 'epoch': 0.26} 26%|██▌ | 322/1230 [6:18:25<17:29:45, 69.37s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3933 [2024-07-31 08:30:28,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.92 | bwd_microstep: 5230.78 | bwd_inner_microstep: 5196.27 | bwd_allreduce_microstep: 34.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3942 [2024-07-31 08:30:37,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3340.57 | bwd_microstep: 5197.68 | bwd_inner_microstep: 5148.77 | bwd_allreduce_microstep: 48.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3793 [2024-07-31 08:30:45,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3236.61 | bwd_microstep: 4826.09 | bwd_inner_microstep: 4806.69 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-31 08:30:53,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.83 | bwd_microstep: 4988.08 | bwd_inner_microstep: 4968.78 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-07-31 08:31:02,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.27 | bwd_microstep: 5207.92 | bwd_inner_microstep: 4803.40 | bwd_allreduce_microstep: 404.45 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3825 [2024-07-31 08:31:10,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3142.62 | bwd_microstep: 4911.91 | bwd_inner_microstep: 4881.47 | bwd_allreduce_microstep: 30.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 08:31:19,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.90 | bwd_microstep: 4999.14 | bwd_inner_microstep: 4944.76 | bwd_allreduce_microstep: 54.32 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2105 [2024-07-31 08:31:28,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 08:31:28,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.72 | bwd_microstep: 5204.07 | bwd_inner_microstep: 4798.96 | bwd_allreduce_microstep: 405.04 | step_microstep: 181.54 [2024-07-31 08:31:28,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27824.33 | bwd: 40565.65 | bwd_inner: 39549.04 | bwd_allreduce: 1016.12 | step: 182.12 26%|██▋ | 323/1230 [6:19:34<17:25:41, 69.17s/it] {'loss': 1.2196, 'learning_rate': 1.7295452183328317e-05, 'epoch': 0.26} 26%|██▋ | 323/1230 [6:19:34<17:25:41, 69.17s/it]dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3540 [2024-07-31 08:31:37,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.38 | bwd_microstep: 5513.67 | bwd_inner_microstep: 5368.89 | bwd_allreduce_microstep: 144.71 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2305 [2024-07-31 08:31:46,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.58 | bwd_microstep: 5285.40 | bwd_inner_microstep: 4875.12 | bwd_allreduce_microstep: 410.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-31 08:31:54,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3234.78 | bwd_microstep: 4828.25 | bwd_inner_microstep: 4808.84 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 08:32:03,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.56 | bwd_microstep: 5059.38 | bwd_inner_microstep: 5029.98 | bwd_allreduce_microstep: 29.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-31 08:32:12,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.06 | bwd_microstep: 5161.55 | bwd_inner_microstep: 5084.84 | bwd_allreduce_microstep: 76.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3660 [2024-07-31 08:32:20,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.79 | bwd_microstep: 4887.72 | bwd_inner_microstep: 4866.35 | bwd_allreduce_microstep: 21.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-31 08:32:29,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.27 | bwd_microstep: 5134.93 | bwd_inner_microstep: 5083.23 | bwd_allreduce_microstep: 51.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-31 08:32:38,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 08:32:38,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.27 | bwd_microstep: 4887.76 | bwd_inner_microstep: 4868.39 | bwd_allreduce_microstep: 19.28 | step_microstep: 181.77 [2024-07-31 08:32:38,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28881.60 | bwd: 40758.65 | bwd_inner: 39985.59 | bwd_allreduce: 772.56 | step: 182.45 26%|██▋ | 324/1230 [6:20:44<17:28:11, 69.42s/it] {'loss': 1.2231, 'learning_rate': 1.7277416535191478e-05, 'epoch': 0.26} 26%|██▋ | 324/1230 [6:20:44<17:28:11, 69.42s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4073 [2024-07-31 08:32:47,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3888.95 | bwd_microstep: 5433.77 | bwd_inner_microstep: 5398.57 | bwd_allreduce_microstep: 35.13 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2804 [2024-07-31 08:32:56,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.37 | bwd_microstep: 5205.63 | bwd_inner_microstep: 4800.14 | bwd_allreduce_microstep: 405.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2310 [2024-07-31 08:33:05,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.10 | bwd_microstep: 5135.75 | bwd_inner_microstep: 4737.37 | bwd_allreduce_microstep: 398.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3786 [2024-07-31 08:33:13,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.76 | bwd_microstep: 5021.33 | bwd_inner_microstep: 5001.86 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3823 [2024-07-31 08:33:22,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.83 | bwd_microstep: 5057.40 | bwd_inner_microstep: 5038.04 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 08:33:31,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.42 | bwd_microstep: 5308.34 | bwd_inner_microstep: 5214.49 | bwd_allreduce_microstep: 93.77 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2149 [2024-07-31 08:33:40,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.12 | bwd_microstep: 5245.91 | bwd_inner_microstep: 4838.33 | bwd_allreduce_microstep: 407.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3670 [2024-07-31 08:33:49,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.77 [2024-07-31 08:33:49,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.60 | bwd_microstep: 4863.81 | bwd_inner_microstep: 4844.45 | bwd_allreduce_microstep: 19.29 | step_microstep: 182.26 [2024-07-31 08:33:49,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29330.07 | bwd: 41271.92 | bwd_inner: 39873.19 | bwd_allreduce: 1398.22 | step: 182.83 26%|██▋ | 325/1230 [6:21:55<17:33:54, 69.87s/it] {'loss': 1.1771, 'learning_rate': 1.7259330421408247e-05, 'epoch': 0.26} 26%|██▋ | 325/1230 [6:21:55<17:33:54, 69.87s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3863 [2024-07-31 08:33:58,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.98 | bwd_microstep: 5361.31 | bwd_inner_microstep: 5271.21 | bwd_allreduce_microstep: 90.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 08:34:07,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.01 | bwd_microstep: 5320.49 | bwd_inner_microstep: 5226.95 | bwd_allreduce_microstep: 93.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2320 [2024-07-31 08:34:16,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.68 | bwd_microstep: 5375.82 | bwd_inner_microstep: 4960.80 | bwd_allreduce_microstep: 414.94 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3808 [2024-07-31 08:34:25,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.98 | bwd_microstep: 5342.32 | bwd_inner_microstep: 5269.78 | bwd_allreduce_microstep: 72.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-31 08:34:34,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.30 | bwd_microstep: 5076.54 | bwd_inner_microstep: 5053.30 | bwd_allreduce_microstep: 23.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 08:34:42,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.74 | bwd_microstep: 5125.77 | bwd_inner_microstep: 5057.07 | bwd_allreduce_microstep: 68.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 08:34:50,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.97 | bwd_microstep: 4839.56 | bwd_inner_microstep: 4796.28 | bwd_allreduce_microstep: 43.21 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 08:34:59,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 08:34:59,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3219.59 | bwd_microstep: 4700.45 | bwd_inner_microstep: 4681.09 | bwd_allreduce_microstep: 19.29 | step_microstep: 181.52 [2024-07-31 08:34:59,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28335.16 | bwd: 41142.25 | bwd_inner: 40316.42 | bwd_allreduce: 825.34 | step: 182.11 27%|██▋ | 326/1230 [6:23:04<17:32:27, 69.85s/it] {'loss': 1.1737, 'learning_rate': 1.7241193967397784e-05, 'epoch': 0.27} 27%|██▋ | 326/1230 [6:23:04<17:32:27, 69.85s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3889 [2024-07-31 08:35:08,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.31 | bwd_microstep: 5258.75 | bwd_inner_microstep: 5191.65 | bwd_allreduce_microstep: 67.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2310 [2024-07-31 08:35:17,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.88 | bwd_microstep: 5393.14 | bwd_inner_microstep: 4975.26 | bwd_allreduce_microstep: 417.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3731 [2024-07-31 08:35:25,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.06 | bwd_microstep: 5165.34 | bwd_inner_microstep: 5084.73 | bwd_allreduce_microstep: 80.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-31 08:35:34,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.53 | bwd_microstep: 4993.49 | bwd_inner_microstep: 4974.08 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-31 08:35:43,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.46 | bwd_microstep: 5192.64 | bwd_inner_microstep: 5117.25 | bwd_allreduce_microstep: 75.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 08:35:52,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.85 | bwd_microstep: 5018.96 | bwd_inner_microstep: 4980.73 | bwd_allreduce_microstep: 38.16 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 08:36:00,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.38 | bwd_microstep: 5161.71 | bwd_inner_microstep: 5078.34 | bwd_allreduce_microstep: 83.31 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3800 [2024-07-31 08:36:09,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 08:36:09,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.83 | bwd_microstep: 4946.19 | bwd_inner_microstep: 4911.77 | bwd_allreduce_microstep: 34.35 | step_microstep: 224.09 [2024-07-31 08:36:09,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29166.20 | bwd: 41130.22 | bwd_inner: 40313.74 | bwd_allreduce: 815.98 | step: 224.67 27%|██▋ | 327/1230 [6:24:15<17:34:58, 70.10s/it] {'loss': 1.2136, 'learning_rate': 1.7223007298928322e-05, 'epoch': 0.27} 27%|██▋ | 327/1230 [6:24:15<17:34:58, 70.10s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2260 [2024-07-31 08:36:18,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3405.63 | bwd_microstep: 5296.61 | bwd_inner_microstep: 4893.12 | bwd_allreduce_microstep: 403.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 08:36:27,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3658.78 | bwd_microstep: 5318.53 | bwd_inner_microstep: 5213.48 | bwd_allreduce_microstep: 104.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 08:36:35,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2981.79 | bwd_microstep: 4824.51 | bwd_inner_microstep: 4450.09 | bwd_allreduce_microstep: 374.35 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3832 [2024-07-31 08:36:44,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.29 | bwd_microstep: 5103.16 | bwd_inner_microstep: 5059.35 | bwd_allreduce_microstep: 43.74 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3845 [2024-07-31 08:36:52,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.66 | bwd_microstep: 5123.77 | bwd_inner_microstep: 5081.08 | bwd_allreduce_microstep: 42.62 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2171 [2024-07-31 08:37:01,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.97 | bwd_microstep: 5241.80 | bwd_inner_microstep: 4832.72 | bwd_allreduce_microstep: 409.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 08:37:10,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.15 | bwd_microstep: 4887.45 | bwd_inner_microstep: 4866.79 | bwd_allreduce_microstep: 20.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 08:37:19,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 08:37:19,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.91 | bwd_microstep: 5188.10 | bwd_inner_microstep: 5114.68 | bwd_allreduce_microstep: 73.36 | step_microstep: 181.47 [2024-07-31 08:37:19,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28159.09 | bwd: 40983.91 | bwd_inner: 39511.25 | bwd_allreduce: 1472.15 | step: 182.05 27%|██▋ | 328/1230 [6:25:25<17:31:04, 69.92s/it] {'loss': 1.2632, 'learning_rate': 1.7204770542116326e-05, 'epoch': 0.27} 27%|██▋ | 328/1230 [6:25:25<17:31:04, 69.92s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3853 [2024-07-31 08:37:28,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.22 | bwd_microstep: 5333.81 | bwd_inner_microstep: 5267.52 | bwd_allreduce_microstep: 66.21 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2259 [2024-07-31 08:37:36,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.03 | bwd_microstep: 5302.10 | bwd_inner_microstep: 4892.73 | bwd_allreduce_microstep: 409.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3893 [2024-07-31 08:37:45,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3812.65 | bwd_microstep: 5126.87 | bwd_inner_microstep: 5107.52 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-31 08:37:54,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.20 | bwd_microstep: 5207.03 | bwd_inner_microstep: 4803.54 | bwd_allreduce_microstep: 403.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 08:38:03,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3358.99 | bwd_microstep: 4987.16 | bwd_inner_microstep: 4934.41 | bwd_allreduce_microstep: 52.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 08:38:11,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.81 | bwd_microstep: 5224.23 | bwd_inner_microstep: 4817.55 | bwd_allreduce_microstep: 406.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 08:38:20,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.62 | bwd_microstep: 4890.04 | bwd_inner_microstep: 4870.69 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 08:38:29,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.81 [2024-07-31 08:38:29,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.27 | bwd_microstep: 4987.28 | bwd_inner_microstep: 4967.71 | bwd_allreduce_microstep: 19.50 | step_microstep: 181.55 [2024-07-31 08:38:29,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28755.70 | bwd: 41058.51 | bwd_inner: 39661.61 | bwd_allreduce: 1396.39 | step: 182.13 27%|██▋ | 329/1230 [6:26:35<17:30:56, 69.98s/it] {'loss': 1.2171, 'learning_rate': 1.7186483823425582e-05, 'epoch': 0.27} 27%|██▋ | 329/1230 [6:26:35<17:30:56, 69.98s/it]dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3849 [2024-07-31 08:38:38,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.34 | bwd_microstep: 5408.14 | bwd_inner_microstep: 5343.00 | bwd_allreduce_microstep: 65.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-31 08:38:47,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.12 | bwd_microstep: 5142.12 | bwd_inner_microstep: 5068.49 | bwd_allreduce_microstep: 73.57 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3074 [2024-07-31 08:38:56,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.43 | bwd_microstep: 5157.51 | bwd_inner_microstep: 4872.70 | bwd_allreduce_microstep: 284.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3900 [2024-07-31 08:39:05,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3808.74 | bwd_microstep: 5127.07 | bwd_inner_microstep: 5107.67 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-31 08:39:13,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.87 | bwd_microstep: 5005.68 | bwd_inner_microstep: 4986.42 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2175 [2024-07-31 08:39:22,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.14 | bwd_microstep: 5151.60 | bwd_inner_microstep: 4749.99 | bwd_allreduce_microstep: 401.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 08:39:31,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.84 | bwd_microstep: 4880.24 | bwd_inner_microstep: 4860.80 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-31 08:39:39,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 08:39:39,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.77 | bwd_microstep: 4926.56 | bwd_inner_microstep: 4900.08 | bwd_allreduce_microstep: 26.41 | step_microstep: 181.57 [2024-07-31 08:39:39,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29362.16 | bwd: 40798.90 | bwd_inner: 39889.09 | bwd_allreduce: 909.31 | step: 182.16 27%|██▋ | 330/1230 [6:27:45<17:32:04, 70.14s/it] {'loss': 1.2089, 'learning_rate': 1.7168147269666357e-05, 'epoch': 0.27} 27%|██▋ | 330/1230 [6:27:45<17:32:04, 70.14s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3916 [2024-07-31 08:39:48,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3817.33 | bwd_microstep: 5202.68 | bwd_inner_microstep: 5183.54 | bwd_allreduce_microstep: 19.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3571 [2024-07-31 08:39:57,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.29 | bwd_microstep: 5150.68 | bwd_inner_microstep: 5063.74 | bwd_allreduce_microstep: 86.88 | step_microstep: 0.18 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2058 [2024-07-31 08:40:06,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.37 | bwd_microstep: 5190.55 | bwd_inner_microstep: 4789.52 | bwd_allreduce_microstep: 400.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 08:40:15,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.79 | bwd_microstep: 5183.05 | bwd_inner_microstep: 5100.28 | bwd_allreduce_microstep: 82.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2230 [2024-07-31 08:40:23,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.12 | bwd_microstep: 5163.47 | bwd_inner_microstep: 4762.74 | bwd_allreduce_microstep: 400.67 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-07-31 08:40:32,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.07 | bwd_microstep: 4954.80 | bwd_inner_microstep: 4921.16 | bwd_allreduce_microstep: 33.57 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-07-31 08:40:41,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.65 | bwd_microstep: 5163.14 | bwd_inner_microstep: 4761.79 | bwd_allreduce_microstep: 401.27 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2114 [2024-07-31 08:40:49,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 08:40:49,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.96 | bwd_microstep: 5095.43 | bwd_inner_microstep: 4699.57 | bwd_allreduce_microstep: 395.78 | step_microstep: 182.53 [2024-07-31 08:40:49,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28652.49 | bwd: 41103.79 | bwd_inner: 39282.27 | bwd_allreduce: 1821.01 | step: 183.22 27%|██▋ | 331/1230 [6:28:55<17:30:40, 70.12s/it] {'loss': 1.1739, 'learning_rate': 1.714976100799449e-05, 'epoch': 0.27} 27%|██▋ | 331/1230 [6:28:55<17:30:40, 70.12s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 08:40:59,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.06 | bwd_microstep: 5414.55 | bwd_inner_microstep: 5307.32 | bwd_allreduce_microstep: 107.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4058 [2024-07-31 08:41:07,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.27 | bwd_microstep: 5156.54 | bwd_inner_microstep: 5137.22 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2234 [2024-07-31 08:41:16,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.78 | bwd_microstep: 5276.61 | bwd_inner_microstep: 4869.12 | bwd_allreduce_microstep: 407.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 08:41:25,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.24 | bwd_microstep: 5048.87 | bwd_inner_microstep: 5021.05 | bwd_allreduce_microstep: 27.76 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2077 [2024-07-31 08:41:34,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.99 | bwd_microstep: 5188.82 | bwd_inner_microstep: 4784.24 | bwd_allreduce_microstep: 404.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 08:41:42,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.92 | bwd_microstep: 5038.51 | bwd_inner_microstep: 4984.98 | bwd_allreduce_microstep: 53.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 08:41:51,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.11 | bwd_microstep: 5117.15 | bwd_inner_microstep: 4719.87 | bwd_allreduce_microstep: 397.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-31 08:42:00,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 08:42:00,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.14 | bwd_microstep: 5060.66 | bwd_inner_microstep: 4997.83 | bwd_allreduce_microstep: 62.76 | step_microstep: 181.49 [2024-07-31 08:42:00,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28853.42 | bwd: 41301.70 | bwd_inner: 39821.56 | bwd_allreduce: 1479.64 | step: 182.08 27%|██▋ | 332/1230 [6:30:06<17:31:07, 70.23s/it] {'loss': 1.228, 'learning_rate': 1.713132516591053e-05, 'epoch': 0.27} 27%|██▋ | 332/1230 [6:30:06<17:31:07, 70.23s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 08:42:09,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3865.09 | bwd_microstep: 5363.01 | bwd_inner_microstep: 5343.90 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3805 [2024-07-31 08:42:18,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.74 | bwd_microstep: 5185.26 | bwd_inner_microstep: 5120.70 | bwd_allreduce_microstep: 64.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-07-31 08:42:27,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.55 | bwd_microstep: 5195.36 | bwd_inner_microstep: 4791.63 | bwd_allreduce_microstep: 403.66 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2066 [2024-07-31 08:42:36,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.78 | bwd_microstep: 5245.20 | bwd_inner_microstep: 4840.93 | bwd_allreduce_microstep: 404.21 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2089 [2024-07-31 08:42:44,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.16 | bwd_microstep: 5241.53 | bwd_inner_microstep: 4835.16 | bwd_allreduce_microstep: 406.31 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2211 [2024-07-31 08:42:53,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3811.36 | bwd_microstep: 5082.88 | bwd_inner_microstep: 4685.97 | bwd_allreduce_microstep: 396.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-31 08:43:02,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.15 | bwd_microstep: 5097.02 | bwd_inner_microstep: 4699.75 | bwd_allreduce_microstep: 397.20 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2214 [2024-07-31 08:43:11,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 08:43:11,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.93 | bwd_microstep: 5050.25 | bwd_inner_microstep: 4658.69 | bwd_allreduce_microstep: 391.49 | step_microstep: 182.78 [2024-07-31 08:43:11,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28910.66 | bwd: 41460.50 | bwd_inner: 38976.66 | bwd_allreduce: 2483.34 | step: 183.47 27%|██▋ | 333/1230 [6:31:17<17:32:02, 70.37s/it] {'loss': 1.1951, 'learning_rate': 1.7112839871258838e-05, 'epoch': 0.27} 27%|██▋ | 333/1230 [6:31:17<17:32:02, 70.37s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2282 [2024-07-31 08:43:20,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.65 | bwd_microstep: 5580.26 | bwd_inner_microstep: 5153.07 | bwd_allreduce_microstep: 427.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3916 [2024-07-31 08:43:29,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3813.48 | bwd_microstep: 5146.52 | bwd_inner_microstep: 5126.64 | bwd_allreduce_microstep: 19.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3845 [2024-07-31 08:43:38,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.28 | bwd_microstep: 5102.72 | bwd_inner_microstep: 5083.36 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 08:43:47,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.83 | bwd_microstep: 5116.56 | bwd_inner_microstep: 5083.25 | bwd_allreduce_microstep: 33.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-31 08:43:55,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.74 | bwd_microstep: 5038.59 | bwd_inner_microstep: 5012.49 | bwd_allreduce_microstep: 26.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3762 [2024-07-31 08:44:04,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.90 | bwd_microstep: 5013.04 | bwd_inner_microstep: 4993.66 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 08:44:13,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.24 | bwd_microstep: 5070.24 | bwd_inner_microstep: 4675.35 | bwd_allreduce_microstep: 394.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3695 [2024-07-31 08:44:22,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 08:44:22,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.20 | bwd_microstep: 4946.68 | bwd_inner_microstep: 4887.17 | bwd_allreduce_microstep: 59.45 | step_microstep: 209.00 [2024-07-31 08:44:22,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29476.24 | bwd: 41014.60 | bwd_inner: 40014.93 | bwd_allreduce: 999.17 | step: 209.59 27%|██▋ | 334/1230 [6:32:27<17:33:02, 70.52s/it] {'loss': 1.2141, 'learning_rate': 1.7094305252226713e-05, 'epoch': 0.27} 27%|██▋ | 334/1230 [6:32:27<17:33:02, 70.52s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3521 [2024-07-31 08:44:30,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3380.29 | bwd_microstep: 4989.64 | bwd_inner_microstep: 4921.25 | bwd_allreduce_microstep: 68.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3782 [2024-07-31 08:44:39,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.41 | bwd_microstep: 5244.99 | bwd_inner_microstep: 5155.99 | bwd_allreduce_microstep: 88.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3807 [2024-07-31 08:44:48,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.38 | bwd_microstep: 5141.49 | bwd_inner_microstep: 5098.73 | bwd_allreduce_microstep: 42.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 08:44:56,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.36 | bwd_microstep: 5124.35 | bwd_inner_microstep: 5046.89 | bwd_allreduce_microstep: 77.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-31 08:45:05,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.49 | bwd_microstep: 5236.27 | bwd_inner_microstep: 4829.54 | bwd_allreduce_microstep: 406.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 08:45:14,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.56 | bwd_microstep: 4948.14 | bwd_inner_microstep: 4901.16 | bwd_allreduce_microstep: 46.91 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 599 [2024-07-31 08:45:22,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.91 | bwd_microstep: 5258.09 | bwd_inner_microstep: 4853.64 | bwd_allreduce_microstep: 404.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 08:45:31,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 08:45:31,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.89 | bwd_microstep: 5044.25 | bwd_inner_microstep: 4982.32 | bwd_allreduce_microstep: 61.86 | step_microstep: 182.34 [2024-07-31 08:45:31,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28382.22 | bwd: 40987.21 | bwd_inner: 39789.46 | bwd_allreduce: 1197.28 | step: 182.91 27%|██▋ | 335/1230 [6:33:37<17:28:11, 70.27s/it] {'loss': 1.2067, 'learning_rate': 1.7075721437343488e-05, 'epoch': 0.27} 27%|██▋ | 335/1230 [6:33:37<17:28:11, 70.27s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3904 [2024-07-31 08:45:40,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.44 | bwd_microstep: 5370.04 | bwd_inner_microstep: 5282.66 | bwd_allreduce_microstep: 87.32 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3932 [2024-07-31 08:45:49,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3807.57 | bwd_microstep: 5156.03 | bwd_inner_microstep: 5136.62 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 08:45:58,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.83 | bwd_microstep: 5147.87 | bwd_inner_microstep: 5072.33 | bwd_allreduce_microstep: 75.47 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 08:46:07,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.24 | bwd_microstep: 5316.77 | bwd_inner_microstep: 5221.13 | bwd_allreduce_microstep: 95.57 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3749 [2024-07-31 08:46:16,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.63 | bwd_microstep: 4934.42 | bwd_inner_microstep: 4915.08 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.21 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 08:46:24,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.44 | bwd_microstep: 5158.10 | bwd_inner_microstep: 5086.11 | bwd_allreduce_microstep: 71.92 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2147 [2024-07-31 08:46:33,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.45 | bwd_microstep: 5077.62 | bwd_inner_microstep: 4684.97 | bwd_allreduce_microstep: 392.58 | step_microstep: 0.21 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3672 [2024-07-31 08:46:42,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 08:46:42,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.72 | bwd_microstep: 4899.09 | bwd_inner_microstep: 4878.32 | bwd_allreduce_microstep: 20.69 | step_microstep: 181.83 [2024-07-31 08:46:42,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29152.24 | bwd: 41059.93 | bwd_inner: 40277.17 | bwd_allreduce: 782.24 | step: 182.68 27%|██▋ | 336/1230 [6:34:48<17:28:15, 70.35s/it] {'loss': 1.1405, 'learning_rate': 1.705708855547966e-05, 'epoch': 0.27} 27%|██▋ | 336/1230 [6:34:48<17:28:15, 70.35s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2645 [2024-07-31 08:46:51,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.15 | bwd_microstep: 5647.74 | bwd_inner_microstep: 5213.04 | bwd_allreduce_microstep: 434.63 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-07-31 08:47:00,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.04 | bwd_microstep: 5213.52 | bwd_inner_microstep: 5124.58 | bwd_allreduce_microstep: 88.87 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2263 [2024-07-31 08:47:08,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3057.04 | bwd_microstep: 5059.20 | bwd_inner_microstep: 4670.43 | bwd_allreduce_microstep: 388.70 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-31 08:47:17,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.81 | bwd_microstep: 5039.97 | bwd_inner_microstep: 5011.49 | bwd_allreduce_microstep: 28.41 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 08:47:26,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.19 | bwd_microstep: 4972.57 | bwd_inner_microstep: 4938.89 | bwd_allreduce_microstep: 33.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-31 08:47:34,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3044.19 | bwd_microstep: 5041.88 | bwd_inner_microstep: 4654.11 | bwd_allreduce_microstep: 387.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 08:47:42,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.49 | bwd_microstep: 4899.25 | bwd_inner_microstep: 4879.86 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3693 [2024-07-31 08:47:52,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 08:47:52,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3888.64 | bwd_microstep: 5141.29 | bwd_inner_microstep: 5057.03 | bwd_allreduce_microstep: 84.19 | step_microstep: 182.29 [2024-07-31 08:47:52,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28447.44 | bwd: 41015.40 | bwd_inner: 39549.35 | bwd_allreduce: 1465.53 | step: 183.00 27%|██▋ | 337/1230 [6:35:57<17:24:38, 70.19s/it] {'loss': 1.2045, 'learning_rate': 1.7038406735845967e-05, 'epoch': 0.27} 27%|██▋ | 337/1230 [6:35:57<17:24:38, 70.19s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 08:48:01,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3692.67 | bwd_microstep: 5354.13 | bwd_inner_microstep: 5281.36 | bwd_allreduce_microstep: 72.69 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3568 [2024-07-31 08:48:10,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.10 | bwd_microstep: 5310.08 | bwd_inner_microstep: 5203.74 | bwd_allreduce_microstep: 106.28 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3788 [2024-07-31 08:48:18,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.52 | bwd_microstep: 5127.36 | bwd_inner_microstep: 5080.98 | bwd_allreduce_microstep: 46.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3792 [2024-07-31 08:48:27,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.61 | bwd_microstep: 5177.84 | bwd_inner_microstep: 5124.64 | bwd_allreduce_microstep: 53.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-31 08:48:36,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.19 | bwd_microstep: 5251.52 | bwd_inner_microstep: 4844.46 | bwd_allreduce_microstep: 407.00 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3657 [2024-07-31 08:48:45,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.07 | bwd_microstep: 5151.38 | bwd_inner_microstep: 5058.82 | bwd_allreduce_microstep: 92.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 08:48:53,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.74 | bwd_microstep: 4897.09 | bwd_inner_microstep: 4877.74 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 08:49:02,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 08:49:02,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.83 | bwd_microstep: 4921.41 | bwd_inner_microstep: 4896.67 | bwd_allreduce_microstep: 24.67 | step_microstep: 181.49 [2024-07-31 08:49:02,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29095.63 | bwd: 41190.78 | bwd_inner: 40368.34 | bwd_allreduce: 821.96 | step: 182.18 27%|██▋ | 338/1230 [6:37:08<17:25:23, 70.32s/it] {'loss': 1.1631, 'learning_rate': 1.7019676107992523e-05, 'epoch': 0.27} 27%|██▋ | 338/1230 [6:37:08<17:25:23, 70.32s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3936 [2024-07-31 08:49:11,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3820.52 | bwd_microstep: 5185.24 | bwd_inner_microstep: 5163.89 | bwd_allreduce_microstep: 21.28 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3812 [2024-07-31 08:49:20,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.95 | bwd_microstep: 5303.34 | bwd_inner_microstep: 5251.33 | bwd_allreduce_microstep: 51.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3773 [2024-07-31 08:49:29,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.82 | bwd_microstep: 5147.20 | bwd_inner_microstep: 5067.13 | bwd_allreduce_microstep: 79.99 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 08:49:38,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.67 | bwd_microstep: 5289.73 | bwd_inner_microstep: 4879.41 | bwd_allreduce_microstep: 410.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 08:49:47,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.36 | bwd_microstep: 5181.47 | bwd_inner_microstep: 5103.93 | bwd_allreduce_microstep: 77.47 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2929 [2024-07-31 08:49:56,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.16 | bwd_microstep: 5230.58 | bwd_inner_microstep: 4822.59 | bwd_allreduce_microstep: 407.91 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-07-31 08:50:05,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.45 | bwd_microstep: 5059.18 | bwd_inner_microstep: 5014.95 | bwd_allreduce_microstep: 44.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-31 08:50:13,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 08:50:13,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.16 | bwd_microstep: 5093.81 | bwd_inner_microstep: 4698.60 | bwd_allreduce_microstep: 395.14 | step_microstep: 184.42 [2024-07-31 08:50:13,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29307.00 | bwd: 41490.53 | bwd_inner: 40001.78 | bwd_allreduce: 1488.26 | step: 185.12 28%|██▊ | 339/1230 [6:38:19<17:27:51, 70.56s/it] {'loss': 1.213, 'learning_rate': 1.70008968018079e-05, 'epoch': 0.28} 28%|██▊ | 339/1230 [6:38:19<17:27:51, 70.56s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 08:50:23,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.01 | bwd_microstep: 5571.42 | bwd_inner_microstep: 5529.53 | bwd_allreduce_microstep: 41.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3899 [2024-07-31 08:50:31,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.17 | bwd_microstep: 5109.17 | bwd_inner_microstep: 5071.96 | bwd_allreduce_microstep: 37.14 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3935 [2024-07-31 08:50:40,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.65 | bwd_microstep: 5222.48 | bwd_inner_microstep: 5163.32 | bwd_allreduce_microstep: 59.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-31 08:50:49,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.08 | bwd_microstep: 5030.15 | bwd_inner_microstep: 4991.73 | bwd_allreduce_microstep: 38.35 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3100 [2024-07-31 08:50:58,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.08 | bwd_microstep: 5140.29 | bwd_inner_microstep: 4877.39 | bwd_allreduce_microstep: 262.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 08:51:07,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.74 | bwd_microstep: 5250.39 | bwd_inner_microstep: 5160.76 | bwd_allreduce_microstep: 89.57 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3641 [2024-07-31 08:51:15,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.98 | bwd_microstep: 5012.28 | bwd_inner_microstep: 4938.20 | bwd_allreduce_microstep: 74.00 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 08:51:24,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 08:51:24,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.55 | bwd_microstep: 5136.81 | bwd_inner_microstep: 4737.58 | bwd_allreduce_microstep: 399.16 | step_microstep: 182.47 [2024-07-31 08:51:24,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28800.17 | bwd: 41472.97 | bwd_inner: 40470.41 | bwd_allreduce: 1002.06 | step: 183.07 28%|██▊ | 340/1230 [6:39:30<17:26:53, 70.58s/it] {'loss': 1.1659, 'learning_rate': 1.6982068947518235e-05, 'epoch': 0.28} 28%|██▊ | 340/1230 [6:39:30<17:26:53, 70.58s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3944 [2024-07-31 08:51:33,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.40 | bwd_microstep: 5156.25 | bwd_inner_microstep: 5123.81 | bwd_allreduce_microstep: 32.37 | step_microstep: 0.08 dynamic ViT batch size: 13, images per sample: 6.5, dynamic token length: 2801 [2024-07-31 08:51:41,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3098.52 | bwd_microstep: 5039.57 | bwd_inner_microstep: 4649.53 | bwd_allreduce_microstep: 389.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3811 [2024-07-31 08:51:50,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.29 | bwd_microstep: 5062.84 | bwd_inner_microstep: 5038.74 | bwd_allreduce_microstep: 24.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-31 08:51:59,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.71 | bwd_microstep: 5111.83 | bwd_inner_microstep: 5069.49 | bwd_allreduce_microstep: 42.27 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 08:52:07,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.69 | bwd_microstep: 5131.13 | bwd_inner_microstep: 5060.02 | bwd_allreduce_microstep: 71.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2139 [2024-07-31 08:52:16,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.46 | bwd_microstep: 5042.64 | bwd_inner_microstep: 4651.38 | bwd_allreduce_microstep: 391.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3667 [2024-07-31 08:52:24,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.82 | bwd_microstep: 5049.56 | bwd_inner_microstep: 4975.60 | bwd_allreduce_microstep: 73.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 08:52:33,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 08:52:33,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.78 | bwd_microstep: 4968.49 | bwd_inner_microstep: 4919.56 | bwd_allreduce_microstep: 48.87 | step_microstep: 181.85 [2024-07-31 08:52:33,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28350.57 | bwd: 40562.29 | bwd_inner: 39488.08 | bwd_allreduce: 1073.72 | step: 182.45 28%|██▊ | 341/1230 [6:40:39<17:19:47, 70.18s/it] {'loss': 1.187, 'learning_rate': 1.6963192675686312e-05, 'epoch': 0.28} 28%|██▊ | 341/1230 [6:40:39<17:19:47, 70.18s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3906 [2024-07-31 08:52:42,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.46 | bwd_microstep: 5466.00 | bwd_inner_microstep: 5393.88 | bwd_allreduce_microstep: 72.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 08:52:51,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.73 | bwd_microstep: 5235.31 | bwd_inner_microstep: 5152.78 | bwd_allreduce_microstep: 82.46 | step_microstep: 0.19 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3743 [2024-07-31 08:53:00,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.09 | bwd_microstep: 5135.01 | bwd_inner_microstep: 5075.96 | bwd_allreduce_microstep: 58.97 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2066 [2024-07-31 08:53:08,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3014.50 | bwd_microstep: 4864.75 | bwd_inner_microstep: 4490.66 | bwd_allreduce_microstep: 374.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 08:53:16,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3244.37 | bwd_microstep: 4864.60 | bwd_inner_microstep: 4834.14 | bwd_allreduce_microstep: 30.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-07-31 08:53:24,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3007.14 | bwd_microstep: 4871.75 | bwd_inner_microstep: 4498.44 | bwd_allreduce_microstep: 373.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 08:53:32,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.32 | bwd_microstep: 4958.76 | bwd_inner_microstep: 4911.60 | bwd_allreduce_microstep: 47.08 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2118 [2024-07-31 08:53:41,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 08:53:41,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.27 | bwd_microstep: 5130.88 | bwd_inner_microstep: 4732.09 | bwd_allreduce_microstep: 398.72 | step_microstep: 181.43 [2024-07-31 08:53:41,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27227.78 | bwd: 40527.04 | bwd_inner: 39089.48 | bwd_allreduce: 1437.07 | step: 182.13 28%|██▊ | 342/1230 [6:41:47<17:09:19, 69.55s/it] {'loss': 1.2258, 'learning_rate': 1.694426811721069e-05, 'epoch': 0.28} 28%|██▊ | 342/1230 [6:41:47<17:09:19, 69.55s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4050 [2024-07-31 08:53:50,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3871.85 | bwd_microstep: 5333.11 | bwd_inner_microstep: 5314.02 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3871 [2024-07-31 08:53:59,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.17 | bwd_microstep: 5110.98 | bwd_inner_microstep: 5091.57 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 08:54:08,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.30 | bwd_microstep: 5167.87 | bwd_inner_microstep: 5089.45 | bwd_allreduce_microstep: 78.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-31 08:54:17,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.98 | bwd_microstep: 5044.36 | bwd_inner_microstep: 4980.20 | bwd_allreduce_microstep: 64.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2310 [2024-07-31 08:54:25,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.73 | bwd_microstep: 5113.11 | bwd_inner_microstep: 4713.96 | bwd_allreduce_microstep: 399.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-31 08:54:34,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.04 | bwd_microstep: 4999.61 | bwd_inner_microstep: 4980.19 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3643 [2024-07-31 08:54:43,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.32 | bwd_microstep: 5066.96 | bwd_inner_microstep: 4980.75 | bwd_allreduce_microstep: 86.14 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2098 [2024-07-31 08:54:52,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 08:54:52,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.51 | bwd_microstep: 5229.57 | bwd_inner_microstep: 4822.81 | bwd_allreduce_microstep: 406.69 | step_microstep: 181.71 [2024-07-31 08:54:52,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29157.83 | bwd: 41065.55 | bwd_inner: 39972.88 | bwd_allreduce: 1092.17 | step: 182.30 28%|██▊ | 343/1230 [6:42:58<17:12:37, 69.85s/it] {'loss': 1.2047, 'learning_rate': 1.692529540332476e-05, 'epoch': 0.28} 28%|██▊ | 343/1230 [6:42:58<17:12:37, 69.85s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3983 [2024-07-31 08:55:01,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3851.71 | bwd_microstep: 5238.87 | bwd_inner_microstep: 5219.79 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3764 [2024-07-31 08:55:09,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3377.23 | bwd_microstep: 5053.68 | bwd_inner_microstep: 5014.93 | bwd_allreduce_microstep: 38.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-31 08:55:18,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.78 | bwd_microstep: 5133.77 | bwd_inner_microstep: 5087.75 | bwd_allreduce_microstep: 45.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3806 [2024-07-31 08:55:27,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.67 | bwd_microstep: 5139.34 | bwd_inner_microstep: 5094.19 | bwd_allreduce_microstep: 45.08 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2184 [2024-07-31 08:55:36,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.04 | bwd_microstep: 5235.66 | bwd_inner_microstep: 4828.77 | bwd_allreduce_microstep: 406.82 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 08:55:44,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.36 | bwd_microstep: 5030.83 | bwd_inner_microstep: 4970.71 | bwd_allreduce_microstep: 60.05 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2127 [2024-07-31 08:55:53,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.13 | bwd_microstep: 5078.45 | bwd_inner_microstep: 4683.65 | bwd_allreduce_microstep: 394.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-31 08:56:01,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-31 08:56:01,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3203.24 | bwd_microstep: 4716.98 | bwd_inner_microstep: 4692.69 | bwd_allreduce_microstep: 24.22 | step_microstep: 181.88 [2024-07-31 08:56:01,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28320.07 | bwd: 40627.55 | bwd_inner: 39592.42 | bwd_allreduce: 1034.64 | step: 182.47 28%|██▊ | 344/1230 [6:44:07<17:08:55, 69.68s/it] {'loss': 1.2238, 'learning_rate': 1.690627466559585e-05, 'epoch': 0.28} 28%|██▊ | 344/1230 [6:44:07<17:08:55, 69.68s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4083 [2024-07-31 08:56:10,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3880.06 | bwd_microstep: 5376.73 | bwd_inner_microstep: 5357.65 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-31 08:56:19,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.26 | bwd_microstep: 5182.89 | bwd_inner_microstep: 5126.83 | bwd_allreduce_microstep: 55.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2230 [2024-07-31 08:56:27,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3038.27 | bwd_microstep: 4997.53 | bwd_inner_microstep: 4613.22 | bwd_allreduce_microstep: 384.25 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-31 08:56:36,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.07 | bwd_microstep: 5150.46 | bwd_inner_microstep: 5067.05 | bwd_allreduce_microstep: 83.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 08:56:45,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.20 | bwd_microstep: 5004.79 | bwd_inner_microstep: 4985.01 | bwd_allreduce_microstep: 19.70 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-31 08:56:54,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.47 | bwd_microstep: 5244.91 | bwd_inner_microstep: 4835.69 | bwd_allreduce_microstep: 409.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-31 08:57:02,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.38 | bwd_microstep: 5180.00 | bwd_inner_microstep: 5103.60 | bwd_allreduce_microstep: 76.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 08:57:11,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 08:57:11,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.68 | bwd_microstep: 5108.08 | bwd_inner_microstep: 4711.24 | bwd_allreduce_microstep: 396.77 | step_microstep: 182.88 [2024-07-31 08:57:11,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28533.29 | bwd: 41245.39 | bwd_inner: 39800.23 | bwd_allreduce: 1444.65 | step: 183.58 28%|██▊ | 345/1230 [6:45:17<17:09:40, 69.81s/it] {'loss': 1.196, 'learning_rate': 1.688720603592432e-05, 'epoch': 0.28} 28%|██▊ | 345/1230 [6:45:17<17:09:40, 69.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2338 [2024-07-31 08:57:20,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.38 | bwd_microstep: 5538.26 | bwd_inner_microstep: 5111.91 | bwd_allreduce_microstep: 426.28 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3798 [2024-07-31 08:57:29,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.42 | bwd_microstep: 5100.20 | bwd_inner_microstep: 5063.43 | bwd_allreduce_microstep: 36.70 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3850 [2024-07-31 08:57:38,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.22 | bwd_microstep: 5119.68 | bwd_inner_microstep: 5066.21 | bwd_allreduce_microstep: 53.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3782 [2024-07-31 08:57:47,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.05 | bwd_microstep: 5151.25 | bwd_inner_microstep: 5102.62 | bwd_allreduce_microstep: 48.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-31 08:57:56,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.87 | bwd_microstep: 5276.28 | bwd_inner_microstep: 4866.44 | bwd_allreduce_microstep: 409.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 08:58:04,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.61 | bwd_microstep: 4912.27 | bwd_inner_microstep: 4888.20 | bwd_allreduce_microstep: 24.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 08:58:13,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.18 | bwd_microstep: 5017.52 | bwd_inner_microstep: 4962.75 | bwd_allreduce_microstep: 54.70 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-31 08:58:22,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 08:58:22,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.07 | bwd_microstep: 5072.97 | bwd_inner_microstep: 4680.18 | bwd_allreduce_microstep: 392.72 | step_microstep: 180.97 [2024-07-31 08:58:22,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28889.70 | bwd: 41188.41 | bwd_inner: 39741.67 | bwd_allreduce: 1446.25 | step: 181.55 28%|██▊ | 346/1230 [6:46:27<17:11:09, 69.99s/it] {'loss': 1.2162, 'learning_rate': 1.6868089646542632e-05, 'epoch': 0.28} 28%|██▊ | 346/1230 [6:46:28<17:11:09, 69.99s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4042 [2024-07-31 08:58:31,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3848.23 | bwd_microstep: 5332.41 | bwd_inner_microstep: 5313.32 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3912 [2024-07-31 08:58:40,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.53 | bwd_microstep: 5332.93 | bwd_inner_microstep: 5283.55 | bwd_allreduce_microstep: 49.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 08:58:49,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.00 | bwd_microstep: 5214.75 | bwd_inner_microstep: 5149.06 | bwd_allreduce_microstep: 65.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-31 08:58:58,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.90 | bwd_microstep: 5171.56 | bwd_inner_microstep: 4768.18 | bwd_allreduce_microstep: 403.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 08:59:06,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.35 | bwd_microstep: 5096.87 | bwd_inner_microstep: 5028.08 | bwd_allreduce_microstep: 68.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-31 08:59:15,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.96 | bwd_microstep: 5200.95 | bwd_inner_microstep: 5128.50 | bwd_allreduce_microstep: 72.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3660 [2024-07-31 08:59:24,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.41 | bwd_microstep: 5042.19 | bwd_inner_microstep: 5001.26 | bwd_allreduce_microstep: 40.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 08:59:33,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 08:59:33,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.23 | bwd_microstep: 5062.90 | bwd_inner_microstep: 5015.29 | bwd_allreduce_microstep: 47.54 | step_microstep: 181.81 [2024-07-31 08:59:33,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29522.52 | bwd: 41454.53 | bwd_inner: 40687.19 | bwd_allreduce: 766.85 | step: 182.39 28%|██▊ | 347/1230 [6:47:39<17:15:50, 70.39s/it] {'loss': 1.2097, 'learning_rate': 1.6848925630014445e-05, 'epoch': 0.28} 28%|██▊ | 347/1230 [6:47:39<17:15:50, 70.39s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4056 [2024-07-31 08:59:42,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3859.90 | bwd_microstep: 5321.59 | bwd_inner_microstep: 5302.48 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 08:59:51,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.63 | bwd_microstep: 5209.21 | bwd_inner_microstep: 5123.10 | bwd_allreduce_microstep: 86.05 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2059 [2024-07-31 09:00:00,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.38 | bwd_microstep: 5257.84 | bwd_inner_microstep: 4851.31 | bwd_allreduce_microstep: 406.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-31 09:00:09,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.97 | bwd_microstep: 5166.82 | bwd_inner_microstep: 5111.57 | bwd_allreduce_microstep: 55.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 09:00:17,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.58 | bwd_microstep: 4994.81 | bwd_inner_microstep: 4975.39 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 09:00:26,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.39 | bwd_microstep: 5204.24 | bwd_inner_microstep: 5125.71 | bwd_allreduce_microstep: 78.46 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2181 [2024-07-31 09:00:35,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.31 | bwd_microstep: 5101.78 | bwd_inner_microstep: 4704.41 | bwd_allreduce_microstep: 397.29 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 09:00:44,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 09:00:44,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.94 | bwd_microstep: 5016.76 | bwd_inner_microstep: 4960.81 | bwd_allreduce_microstep: 55.87 | step_microstep: 181.61 [2024-07-31 09:00:44,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29094.01 | bwd: 41273.03 | bwd_inner: 40154.74 | bwd_allreduce: 1117.79 | step: 182.20 28%|██▊ | 348/1230 [6:48:50<17:16:02, 70.48s/it] {'loss': 1.1976, 'learning_rate': 1.6829714119233688e-05, 'epoch': 0.28} 28%|██▊ | 348/1230 [6:48:50<17:16:02, 70.48s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3871 [2024-07-31 09:00:53,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.01 | bwd_microstep: 5375.73 | bwd_inner_microstep: 5307.91 | bwd_allreduce_microstep: 67.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 09:01:01,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.76 | bwd_microstep: 5154.96 | bwd_inner_microstep: 5100.42 | bwd_allreduce_microstep: 54.45 | step_microstep: 0.18 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3733 [2024-07-31 09:01:10,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.43 | bwd_microstep: 5145.06 | bwd_inner_microstep: 5077.92 | bwd_allreduce_microstep: 67.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-31 09:01:19,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.32 | bwd_microstep: 5022.57 | bwd_inner_microstep: 5003.23 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2270 [2024-07-31 09:01:28,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.53 | bwd_microstep: 5240.11 | bwd_inner_microstep: 4833.00 | bwd_allreduce_microstep: 407.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 09:01:37,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.01 | bwd_microstep: 5312.43 | bwd_inner_microstep: 5216.53 | bwd_allreduce_microstep: 95.84 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2992 [2024-07-31 09:01:46,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.29 | bwd_microstep: 5201.79 | bwd_inner_microstep: 4796.09 | bwd_allreduce_microstep: 405.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-31 09:01:55,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 09:01:55,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.68 | bwd_microstep: 5138.39 | bwd_inner_microstep: 4740.82 | bwd_allreduce_microstep: 397.50 | step_microstep: 182.54 [2024-07-31 09:01:55,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28985.93 | bwd: 41591.01 | bwd_inner: 40075.86 | bwd_allreduce: 1514.66 | step: 183.22 28%|██▊ | 349/1230 [6:50:00<17:16:46, 70.61s/it] {'loss': 1.2536, 'learning_rate': 1.6810455247423634e-05, 'epoch': 0.28} 28%|██▊ | 349/1230 [6:50:00<17:16:46, 70.61s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3863 [2024-07-31 09:02:04,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3840.10 | bwd_microstep: 5237.09 | bwd_inner_microstep: 5196.44 | bwd_allreduce_microstep: 40.58 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3628 [2024-07-31 09:02:13,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3671.20 | bwd_microstep: 5267.34 | bwd_inner_microstep: 5197.48 | bwd_allreduce_microstep: 69.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 09:02:22,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.73 | bwd_microstep: 5302.39 | bwd_inner_microstep: 5230.73 | bwd_allreduce_microstep: 71.59 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2212 [2024-07-31 09:02:30,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.16 | bwd_microstep: 5109.69 | bwd_inner_microstep: 4712.10 | bwd_allreduce_microstep: 397.52 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3782 [2024-07-31 09:02:38,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.61 | bwd_microstep: 4836.36 | bwd_inner_microstep: 4816.94 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 09:02:47,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.70 | bwd_microstep: 5114.68 | bwd_inner_microstep: 5062.46 | bwd_allreduce_microstep: 52.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2139 [2024-07-31 09:02:56,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.04 | bwd_microstep: 5102.85 | bwd_inner_microstep: 4705.60 | bwd_allreduce_microstep: 397.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 09:03:04,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 09:03:04,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.44 | bwd_microstep: 4915.08 | bwd_inner_microstep: 4538.73 | bwd_allreduce_microstep: 376.28 | step_microstep: 181.01 [2024-07-31 09:03:04,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28007.89 | bwd: 40885.47 | bwd_inner: 39460.43 | bwd_allreduce: 1424.55 | step: 181.59 28%|██▊ | 350/1230 [6:51:10<17:09:29, 70.19s/it] {'loss': 1.1825, 'learning_rate': 1.6791149148136003e-05, 'epoch': 0.28} 28%|██▊ | 350/1230 [6:51:10<17:09:29, 70.19s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 09:03:13,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3863.13 | bwd_microstep: 5371.56 | bwd_inner_microstep: 5352.48 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3941 [2024-07-31 09:03:22,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3803.95 | bwd_microstep: 5265.15 | bwd_inner_microstep: 5232.04 | bwd_allreduce_microstep: 33.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2275 [2024-07-31 09:03:31,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.91 | bwd_microstep: 5254.13 | bwd_inner_microstep: 4846.46 | bwd_allreduce_microstep: 407.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 09:03:39,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.39 | bwd_microstep: 4718.67 | bwd_inner_microstep: 4688.58 | bwd_allreduce_microstep: 30.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-31 09:03:48,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.37 | bwd_microstep: 5183.01 | bwd_inner_microstep: 5099.11 | bwd_allreduce_microstep: 83.84 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2095 [2024-07-31 09:03:56,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.00 | bwd_microstep: 5110.40 | bwd_inner_microstep: 4713.27 | bwd_allreduce_microstep: 397.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 09:04:05,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.67 | bwd_microstep: 4969.65 | bwd_inner_microstep: 4919.16 | bwd_allreduce_microstep: 50.42 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3658 [2024-07-31 09:04:13,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 09:04:13,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3141.81 | bwd_microstep: 4850.31 | bwd_inner_microstep: 4804.92 | bwd_allreduce_microstep: 45.33 | step_microstep: 181.67 [2024-07-31 09:04:13,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28228.12 | bwd: 40722.86 | bwd_inner: 39655.95 | bwd_allreduce: 1066.42 | step: 182.26 29%|██▊ | 351/1230 [6:52:19<17:04:19, 69.92s/it] {'loss': 1.2136, 'learning_rate': 1.677179595525e-05, 'epoch': 0.29} 29%|██▊ | 351/1230 [6:52:19<17:04:19, 69.92s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2058 [2024-07-31 09:04:22,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.53 | bwd_microstep: 5506.73 | bwd_inner_microstep: 5082.61 | bwd_allreduce_microstep: 424.06 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3750 [2024-07-31 09:04:31,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.63 | bwd_microstep: 5206.18 | bwd_inner_microstep: 5147.63 | bwd_allreduce_microstep: 58.49 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3810 [2024-07-31 09:04:40,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.85 | bwd_microstep: 5163.01 | bwd_inner_microstep: 5117.36 | bwd_allreduce_microstep: 45.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-07-31 09:04:49,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.27 | bwd_microstep: 5062.53 | bwd_inner_microstep: 5036.22 | bwd_allreduce_microstep: 26.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 09:04:58,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.24 | bwd_microstep: 5230.91 | bwd_inner_microstep: 5166.63 | bwd_allreduce_microstep: 64.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 09:05:06,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.15 | bwd_microstep: 5145.95 | bwd_inner_microstep: 5073.18 | bwd_allreduce_microstep: 72.70 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3174 [2024-07-31 09:05:15,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.09 | bwd_microstep: 5037.52 | bwd_inner_microstep: 4856.03 | bwd_allreduce_microstep: 181.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 09:05:24,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 09:05:24,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.36 | bwd_microstep: 5037.84 | bwd_inner_microstep: 4980.78 | bwd_allreduce_microstep: 56.99 | step_microstep: 181.68 [2024-07-31 09:05:24,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28934.02 | bwd: 41390.64 | bwd_inner: 40460.37 | bwd_allreduce: 929.77 | step: 182.27 29%|██▊ | 352/1230 [6:53:30<17:06:23, 70.14s/it] {'loss': 1.2162, 'learning_rate': 1.6752395802971407e-05, 'epoch': 0.29} 29%|██▊ | 352/1230 [6:53:30<17:06:23, 70.14s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4058 [2024-07-31 09:05:32,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3382.68 | bwd_microstep: 5178.78 | bwd_inner_microstep: 5159.68 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3569 [2024-07-31 09:05:41,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3321.92 | bwd_microstep: 5063.21 | bwd_inner_microstep: 4992.65 | bwd_allreduce_microstep: 70.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3792 [2024-07-31 09:05:50,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3791.78 | bwd_microstep: 5066.77 | bwd_inner_microstep: 5039.60 | bwd_allreduce_microstep: 27.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-31 09:05:58,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.21 | bwd_microstep: 5202.19 | bwd_inner_microstep: 5118.45 | bwd_allreduce_microstep: 83.68 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3785 [2024-07-31 09:06:08,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 4068.04 | bwd_microstep: 5024.07 | bwd_inner_microstep: 5004.64 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 09:06:16,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.92 | bwd_microstep: 5154.40 | bwd_inner_microstep: 5077.44 | bwd_allreduce_microstep: 76.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2198 [2024-07-31 09:06:25,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.10 | bwd_microstep: 5132.45 | bwd_inner_microstep: 4733.44 | bwd_allreduce_microstep: 398.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 09:06:34,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 09:06:34,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.65 | bwd_microstep: 4887.26 | bwd_inner_microstep: 4867.91 | bwd_allreduce_microstep: 19.27 | step_microstep: 181.55 [2024-07-31 09:06:34,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29020.20 | bwd: 40709.12 | bwd_inner: 39993.76 | bwd_allreduce: 714.86 | step: 182.23 29%|██▊ | 353/1230 [6:54:40<17:04:53, 70.12s/it] {'loss': 1.2377, 'learning_rate': 1.6732948825831657e-05, 'epoch': 0.29} 29%|██▊ | 353/1230 [6:54:40<17:04:53, 70.12s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4095 [2024-07-31 09:06:42,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3397.60 | bwd_microstep: 5195.00 | bwd_inner_microstep: 5172.49 | bwd_allreduce_microstep: 22.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3921 [2024-07-31 09:06:51,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3780.69 | bwd_microstep: 5177.92 | bwd_inner_microstep: 5158.52 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3784 [2024-07-31 09:07:00,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.62 | bwd_microstep: 5195.69 | bwd_inner_microstep: 5142.15 | bwd_allreduce_microstep: 53.47 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3595 [2024-07-31 09:07:09,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.31 | bwd_microstep: 5143.69 | bwd_inner_microstep: 5050.00 | bwd_allreduce_microstep: 93.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 09:07:18,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.66 | bwd_microstep: 5161.08 | bwd_inner_microstep: 5106.18 | bwd_allreduce_microstep: 54.82 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2110 [2024-07-31 09:07:26,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.91 | bwd_microstep: 5121.51 | bwd_inner_microstep: 4723.96 | bwd_allreduce_microstep: 397.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 09:07:35,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.24 | bwd_microstep: 5043.41 | bwd_inner_microstep: 4984.58 | bwd_allreduce_microstep: 58.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-31 09:07:44,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 09:07:44,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.13 | bwd_microstep: 4992.48 | bwd_inner_microstep: 4942.48 | bwd_allreduce_microstep: 49.93 | step_microstep: 181.24 [2024-07-31 09:07:44,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28616.06 | bwd: 41030.76 | bwd_inner: 40280.31 | bwd_allreduce: 749.97 | step: 181.84 29%|██▉ | 354/1230 [6:55:50<17:03:06, 70.08s/it] {'loss': 1.1747, 'learning_rate': 1.6713455158686878e-05, 'epoch': 0.29} 29%|██▉ | 354/1230 [6:55:50<17:03:06, 70.08s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 09:07:53,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.83 | bwd_microstep: 5552.09 | bwd_inner_microstep: 5389.96 | bwd_allreduce_microstep: 162.06 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3560 [2024-07-31 09:08:02,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.79 | bwd_microstep: 5430.37 | bwd_inner_microstep: 5237.10 | bwd_allreduce_microstep: 193.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-31 09:08:11,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.66 | bwd_microstep: 5058.47 | bwd_inner_microstep: 5018.28 | bwd_allreduce_microstep: 40.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-31 09:08:20,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.76 | bwd_microstep: 5023.21 | bwd_inner_microstep: 5003.46 | bwd_allreduce_microstep: 19.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 09:08:28,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.02 | bwd_microstep: 5086.03 | bwd_inner_microstep: 5041.49 | bwd_allreduce_microstep: 44.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 09:08:37,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.42 | bwd_microstep: 5155.44 | bwd_inner_microstep: 4757.39 | bwd_allreduce_microstep: 397.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 09:08:46,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.94 | bwd_microstep: 4977.61 | bwd_inner_microstep: 4958.20 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 09:08:54,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 09:08:54,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3011.47 | bwd_microstep: 4880.13 | bwd_inner_microstep: 4504.78 | bwd_allreduce_microstep: 375.28 | step_microstep: 414.33 [2024-07-31 09:08:54,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28533.77 | bwd: 41163.34 | bwd_inner: 39910.59 | bwd_allreduce: 1252.26 | step: 414.92 29%|██▉ | 355/1230 [6:57:00<17:02:45, 70.13s/it] {'loss': 1.186, 'learning_rate': 1.6693914936716983e-05, 'epoch': 0.29} 29%|██▉ | 355/1230 [6:57:00<17:02:45, 70.13s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4094 [2024-07-31 09:09:03,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3851.62 | bwd_microstep: 5389.03 | bwd_inner_microstep: 5369.92 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3938 [2024-07-31 09:09:12,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3808.61 | bwd_microstep: 5177.10 | bwd_inner_microstep: 5157.74 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3936 [2024-07-31 09:09:21,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3818.16 | bwd_microstep: 5174.72 | bwd_inner_microstep: 5155.39 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 09:09:30,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.40 | bwd_microstep: 5183.15 | bwd_inner_microstep: 5105.58 | bwd_allreduce_microstep: 77.51 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2213 [2024-07-31 09:09:39,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.10 | bwd_microstep: 5267.75 | bwd_inner_microstep: 4860.99 | bwd_allreduce_microstep: 406.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 09:09:48,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.36 | bwd_microstep: 4969.48 | bwd_inner_microstep: 4950.17 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 09:09:56,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3208.96 | bwd_microstep: 4729.94 | bwd_inner_microstep: 4706.49 | bwd_allreduce_microstep: 23.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-31 09:10:04,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 09:10:04,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.57 | bwd_microstep: 5048.24 | bwd_inner_microstep: 4654.18 | bwd_allreduce_microstep: 394.00 | step_microstep: 181.63 [2024-07-31 09:10:04,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29092.67 | bwd: 40939.39 | bwd_inner: 39960.40 | bwd_allreduce: 978.49 | step: 182.21 29%|██▉ | 356/1230 [6:58:10<17:02:37, 70.20s/it] {'loss': 1.1659, 'learning_rate': 1.6674328295424723e-05, 'epoch': 0.29} 29%|██▉ | 356/1230 [6:58:10<17:02:37, 70.20s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2347 [2024-07-31 09:10:13,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.93 | bwd_microstep: 5415.27 | bwd_inner_microstep: 4999.00 | bwd_allreduce_microstep: 416.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-31 09:10:22,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.47 | bwd_microstep: 5035.74 | bwd_inner_microstep: 5009.81 | bwd_allreduce_microstep: 25.87 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3578 [2024-07-31 09:10:31,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.18 | bwd_microstep: 5166.48 | bwd_inner_microstep: 5078.85 | bwd_allreduce_microstep: 87.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-31 09:10:40,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.24 | bwd_microstep: 5152.32 | bwd_inner_microstep: 5069.60 | bwd_allreduce_microstep: 82.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-31 09:10:49,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.21 | bwd_microstep: 5006.26 | bwd_inner_microstep: 4969.51 | bwd_allreduce_microstep: 36.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 09:10:57,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.36 | bwd_microstep: 5010.54 | bwd_inner_microstep: 4954.95 | bwd_allreduce_microstep: 55.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 09:11:06,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.67 | bwd_microstep: 5098.81 | bwd_inner_microstep: 5026.69 | bwd_allreduce_microstep: 72.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 09:11:15,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 09:11:15,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.55 | bwd_microstep: 5223.26 | bwd_inner_microstep: 4818.48 | bwd_allreduce_microstep: 404.71 | step_microstep: 181.72 [2024-07-31 09:11:15,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29007.52 | bwd: 41108.67 | bwd_inner: 39926.82 | bwd_allreduce: 1181.36 | step: 182.30 29%|██▉ | 357/1230 [6:59:21<17:02:31, 70.28s/it] {'loss': 1.2005, 'learning_rate': 1.6654695370634738e-05, 'epoch': 0.29} 29%|██▉ | 357/1230 [6:59:21<17:02:31, 70.28s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3998 [2024-07-31 09:11:24,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3832.09 | bwd_microstep: 5267.52 | bwd_inner_microstep: 5248.40 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2303 [2024-07-31 09:11:33,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.73 | bwd_microstep: 5259.18 | bwd_inner_microstep: 4849.95 | bwd_allreduce_microstep: 409.15 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2137 [2024-07-31 09:11:42,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.94 | bwd_microstep: 5270.93 | bwd_inner_microstep: 4862.66 | bwd_allreduce_microstep: 408.20 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 09:11:50,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.84 | bwd_microstep: 5042.18 | bwd_inner_microstep: 4999.69 | bwd_allreduce_microstep: 42.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-31 09:11:59,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.72 | bwd_microstep: 5267.46 | bwd_inner_microstep: 4858.69 | bwd_allreduce_microstep: 408.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 09:12:08,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.07 | bwd_microstep: 4999.78 | bwd_inner_microstep: 4946.19 | bwd_allreduce_microstep: 53.52 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 09:12:16,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.63 | bwd_microstep: 5204.45 | bwd_inner_microstep: 4798.41 | bwd_allreduce_microstep: 405.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-31 09:12:25,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 09:12:25,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3056.29 | bwd_microstep: 5013.03 | bwd_inner_microstep: 4625.93 | bwd_allreduce_microstep: 387.03 | step_microstep: 181.39 [2024-07-31 09:12:25,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28257.21 | bwd: 41324.50 | bwd_inner: 39189.86 | bwd_allreduce: 2134.13 | step: 182.09 29%|██▉ | 358/1230 [7:00:31<16:59:45, 70.17s/it] {'loss': 1.1649, 'learning_rate': 1.6635016298492628e-05, 'epoch': 0.29} 29%|██▉ | 358/1230 [7:00:31<16:59:45, 70.17s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4075 [2024-07-31 09:12:34,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3865.66 | bwd_microstep: 5348.04 | bwd_inner_microstep: 5328.98 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 770 [2024-07-31 09:12:43,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.17 | bwd_microstep: 5369.54 | bwd_inner_microstep: 4954.83 | bwd_allreduce_microstep: 414.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3581 [2024-07-31 09:12:52,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.61 | bwd_microstep: 5373.99 | bwd_inner_microstep: 5224.68 | bwd_allreduce_microstep: 149.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 09:13:01,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.95 | bwd_microstep: 4974.30 | bwd_inner_microstep: 4954.85 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 09:13:09,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.96 | bwd_microstep: 5163.24 | bwd_inner_microstep: 4761.70 | bwd_allreduce_microstep: 401.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-31 09:13:18,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.72 | bwd_microstep: 4978.45 | bwd_inner_microstep: 4959.11 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 09:13:27,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.38 | bwd_microstep: 5074.34 | bwd_inner_microstep: 5012.02 | bwd_allreduce_microstep: 62.26 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2131 [2024-07-31 09:13:36,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 09:13:36,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.61 | bwd_microstep: 5156.08 | bwd_inner_microstep: 4754.66 | bwd_allreduce_microstep: 401.35 | step_microstep: 182.34 [2024-07-31 09:13:36,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29130.97 | bwd: 41437.95 | bwd_inner: 39950.77 | bwd_allreduce: 1486.69 | step: 182.92 29%|██▉ | 359/1230 [7:01:42<17:01:46, 70.39s/it] {'loss': 1.157, 'learning_rate': 1.6615291215464005e-05, 'epoch': 0.29} 29%|██▉ | 359/1230 [7:01:42<17:01:46, 70.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-31 09:13:45,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.63 | bwd_microstep: 5359.43 | bwd_inner_microstep: 5220.38 | bwd_allreduce_microstep: 138.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-31 09:13:53,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.82 | bwd_microstep: 5220.20 | bwd_inner_microstep: 4814.69 | bwd_allreduce_microstep: 405.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-31 09:14:02,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.15 | bwd_microstep: 5204.77 | bwd_inner_microstep: 4799.16 | bwd_allreduce_microstep: 405.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 09:14:11,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.52 | bwd_microstep: 4992.29 | bwd_inner_microstep: 4970.07 | bwd_allreduce_microstep: 22.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 09:14:20,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.73 | bwd_microstep: 5002.93 | bwd_inner_microstep: 4950.40 | bwd_allreduce_microstep: 52.47 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2868 [2024-07-31 09:14:28,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.84 | bwd_microstep: 5205.01 | bwd_inner_microstep: 4797.60 | bwd_allreduce_microstep: 407.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2108 [2024-07-31 09:14:37,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.58 | bwd_microstep: 5079.66 | bwd_inner_microstep: 4684.04 | bwd_allreduce_microstep: 395.55 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2111 [2024-07-31 09:14:46,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 09:14:46,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.87 | bwd_microstep: 5106.24 | bwd_inner_microstep: 4710.07 | bwd_allreduce_microstep: 396.09 | step_microstep: 181.32 [2024-07-31 09:14:46,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28616.04 | bwd: 41170.52 | bwd_inner: 38946.36 | bwd_allreduce: 2223.67 | step: 181.89 29%|██▉ | 360/1230 [7:02:52<16:59:25, 70.31s/it] {'loss': 1.211, 'learning_rate': 1.6595520258333545e-05, 'epoch': 0.29} 29%|██▉ | 360/1230 [7:02:52<16:59:25, 70.31s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3555 [2024-07-31 09:14:55,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.27 | bwd_microstep: 5260.46 | bwd_inner_microstep: 5163.41 | bwd_allreduce_microstep: 96.99 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2819 [2024-07-31 09:15:04,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.95 | bwd_microstep: 5358.44 | bwd_inner_microstep: 4942.35 | bwd_allreduce_microstep: 416.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3769 [2024-07-31 09:15:12,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.69 | bwd_microstep: 5004.38 | bwd_inner_microstep: 4984.98 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2234 [2024-07-31 09:15:21,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.13 | bwd_microstep: 5213.92 | bwd_inner_microstep: 4808.30 | bwd_allreduce_microstep: 405.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 09:15:30,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.10 | bwd_microstep: 4979.08 | bwd_inner_microstep: 4959.68 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 09:15:38,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.88 | bwd_microstep: 4999.72 | bwd_inner_microstep: 4948.90 | bwd_allreduce_microstep: 50.76 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3665 [2024-07-31 09:15:47,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.88 | bwd_microstep: 5109.70 | bwd_inner_microstep: 5027.16 | bwd_allreduce_microstep: 82.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 09:15:55,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 09:15:55,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3180.83 | bwd_microstep: 4686.20 | bwd_inner_microstep: 4666.90 | bwd_allreduce_microstep: 19.23 | step_microstep: 182.14 [2024-07-31 09:15:55,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28518.59 | bwd: 40611.89 | bwd_inner: 39501.62 | bwd_allreduce: 1109.78 | step: 182.72 29%|██▉ | 361/1230 [7:04:01<16:54:35, 70.05s/it] {'loss': 1.186, 'learning_rate': 1.657570356420404e-05, 'epoch': 0.29} 29%|██▉ | 361/1230 [7:04:01<16:54:35, 70.05s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3766 [2024-07-31 09:16:04,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.19 | bwd_microstep: 5387.75 | bwd_inner_microstep: 5312.99 | bwd_allreduce_microstep: 74.70 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 09:16:13,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.93 | bwd_microstep: 5176.34 | bwd_inner_microstep: 5121.76 | bwd_allreduce_microstep: 54.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3580 [2024-07-31 09:16:22,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.54 | bwd_microstep: 5132.29 | bwd_inner_microstep: 5050.75 | bwd_allreduce_microstep: 81.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 09:16:30,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.76 | bwd_microstep: 5031.96 | bwd_inner_microstep: 4972.69 | bwd_allreduce_microstep: 59.21 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 09:16:39,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.63 | bwd_microstep: 5170.62 | bwd_inner_microstep: 5116.09 | bwd_allreduce_microstep: 54.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 09:16:47,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3192.10 | bwd_microstep: 4685.02 | bwd_inner_microstep: 4664.39 | bwd_allreduce_microstep: 20.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2116 [2024-07-31 09:16:56,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.14 | bwd_microstep: 5041.41 | bwd_inner_microstep: 4649.36 | bwd_allreduce_microstep: 391.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 09:17:05,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 09:17:05,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.79 | bwd_microstep: 5317.79 | bwd_inner_microstep: 5130.64 | bwd_allreduce_microstep: 187.08 | step_microstep: 181.43 [2024-07-31 09:17:05,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28209.99 | bwd: 40943.18 | bwd_inner: 40018.61 | bwd_allreduce: 924.07 | step: 182.12 29%|██▉ | 362/1230 [7:05:11<16:50:56, 69.88s/it] {'loss': 1.1863, 'learning_rate': 1.6555841270495456e-05, 'epoch': 0.29} 29%|██▉ | 362/1230 [7:05:11<16:50:56, 69.88s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2348 [2024-07-31 09:17:14,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.75 | bwd_microstep: 5576.94 | bwd_inner_microstep: 5150.28 | bwd_allreduce_microstep: 426.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3974 [2024-07-31 09:17:23,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.46 | bwd_microstep: 5232.86 | bwd_inner_microstep: 5179.10 | bwd_allreduce_microstep: 53.69 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-31 09:17:32,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.92 | bwd_microstep: 5183.34 | bwd_inner_microstep: 5096.67 | bwd_allreduce_microstep: 86.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 09:17:40,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.36 | bwd_microstep: 5073.57 | bwd_inner_microstep: 5004.85 | bwd_allreduce_microstep: 68.65 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3756 [2024-07-31 09:17:49,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.63 | bwd_microstep: 5006.67 | bwd_inner_microstep: 4950.18 | bwd_allreduce_microstep: 56.42 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3637 [2024-07-31 09:17:58,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.24 | bwd_microstep: 5154.29 | bwd_inner_microstep: 5067.88 | bwd_allreduce_microstep: 86.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 09:18:06,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.53 | bwd_microstep: 5008.74 | bwd_inner_microstep: 4957.20 | bwd_allreduce_microstep: 51.47 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 09:18:15,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 09:18:15,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.36 | bwd_microstep: 4894.33 | bwd_inner_microstep: 4874.95 | bwd_allreduce_microstep: 19.31 | step_microstep: 181.56 [2024-07-31 09:18:15,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28831.17 | bwd: 41130.71 | bwd_inner: 40281.03 | bwd_allreduce: 849.17 | step: 182.16 30%|██▉ | 363/1230 [7:06:21<16:51:33, 70.00s/it] {'loss': 1.1728, 'learning_rate': 1.6535933514943955e-05, 'epoch': 0.3} 30%|██▉ | 363/1230 [7:06:21<16:51:33, 70.00s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4080 [2024-07-31 09:18:24,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3882.79 | bwd_microstep: 5381.97 | bwd_inner_microstep: 5362.92 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 09:18:33,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.65 | bwd_microstep: 5083.95 | bwd_inner_microstep: 5038.85 | bwd_allreduce_microstep: 45.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-07-31 09:18:42,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.01 | bwd_microstep: 5028.07 | bwd_inner_microstep: 4968.69 | bwd_allreduce_microstep: 59.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-31 09:18:50,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.47 | bwd_microstep: 5197.54 | bwd_inner_microstep: 5118.47 | bwd_allreduce_microstep: 79.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3653 [2024-07-31 09:18:59,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.93 | bwd_microstep: 4868.24 | bwd_inner_microstep: 4848.87 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 09:19:08,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.66 | bwd_microstep: 5022.89 | bwd_inner_microstep: 4966.84 | bwd_allreduce_microstep: 55.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3666 [2024-07-31 09:19:16,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.32 | bwd_microstep: 4913.54 | bwd_inner_microstep: 4887.76 | bwd_allreduce_microstep: 25.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 09:19:25,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 09:19:25,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.73 | bwd_microstep: 5010.28 | bwd_inner_microstep: 4990.94 | bwd_allreduce_microstep: 19.28 | step_microstep: 182.73 [2024-07-31 09:19:25,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29343.46 | bwd: 40506.46 | bwd_inner: 40183.29 | bwd_allreduce: 322.68 | step: 183.30 30%|██▉ | 364/1230 [7:07:31<16:51:11, 70.06s/it] {'loss': 1.1642, 'learning_rate': 1.6515980435600965e-05, 'epoch': 0.3} 30%|██▉ | 364/1230 [7:07:31<16:51:11, 70.06s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3899 [2024-07-31 09:19:34,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.35 | bwd_microstep: 5527.29 | bwd_inner_microstep: 5438.12 | bwd_allreduce_microstep: 89.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 09:19:43,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.57 | bwd_microstep: 4990.12 | bwd_inner_microstep: 4970.82 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 09:19:52,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.27 | bwd_microstep: 5106.37 | bwd_inner_microstep: 5037.11 | bwd_allreduce_microstep: 69.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 09:20:00,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3207.54 | bwd_microstep: 4734.08 | bwd_inner_microstep: 4707.75 | bwd_allreduce_microstep: 26.26 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3679 [2024-07-31 09:20:09,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.94 | bwd_microstep: 5105.15 | bwd_inner_microstep: 5051.58 | bwd_allreduce_microstep: 53.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 09:20:17,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.23 | bwd_microstep: 5305.76 | bwd_inner_microstep: 4895.77 | bwd_allreduce_microstep: 409.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 09:20:26,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3240.53 | bwd_microstep: 4890.74 | bwd_inner_microstep: 4843.04 | bwd_allreduce_microstep: 47.63 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 09:20:34,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 09:20:34,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3487.26 | bwd_microstep: 5056.47 | bwd_inner_microstep: 4665.19 | bwd_allreduce_microstep: 391.22 | step_microstep: 181.24 [2024-07-31 09:20:34,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28134.60 | bwd: 40715.96 | bwd_inner: 39609.31 | bwd_allreduce: 1106.16 | step: 181.83 30%|██▉ | 365/1230 [7:08:40<16:46:13, 69.80s/it] {'loss': 1.1872, 'learning_rate': 1.6495982170832224e-05, 'epoch': 0.3} 30%|██▉ | 365/1230 [7:08:40<16:46:13, 69.80s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4080 [2024-07-31 09:20:44,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.69 | bwd_microstep: 5605.28 | bwd_inner_microstep: 5541.26 | bwd_allreduce_microstep: 63.95 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2278 [2024-07-31 09:20:53,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.53 | bwd_microstep: 5281.39 | bwd_inner_microstep: 4871.37 | bwd_allreduce_microstep: 409.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3809 [2024-07-31 09:21:01,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.93 | bwd_microstep: 5037.87 | bwd_inner_microstep: 5018.55 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2264 [2024-07-31 09:21:10,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.54 | bwd_microstep: 5208.96 | bwd_inner_microstep: 4802.95 | bwd_allreduce_microstep: 405.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 09:21:19,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.43 | bwd_microstep: 5237.69 | bwd_inner_microstep: 5154.97 | bwd_allreduce_microstep: 82.65 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-31 09:21:28,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.53 | bwd_microstep: 5207.82 | bwd_inner_microstep: 5150.14 | bwd_allreduce_microstep: 57.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-31 09:21:37,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.18 | bwd_microstep: 5167.40 | bwd_inner_microstep: 5086.73 | bwd_allreduce_microstep: 80.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 09:21:45,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-31 09:21:45,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.01 | bwd_microstep: 5022.55 | bwd_inner_microstep: 4969.10 | bwd_allreduce_microstep: 53.38 | step_microstep: 182.12 [2024-07-31 09:21:45,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29053.74 | bwd: 41768.93 | bwd_inner: 40595.02 | bwd_allreduce: 1173.42 | step: 182.72 30%|██▉ | 366/1230 [7:09:51<16:50:55, 70.20s/it] {'loss': 1.1606, 'learning_rate': 1.6475938859316795e-05, 'epoch': 0.3} 30%|██▉ | 366/1230 [7:09:51<16:50:55, 70.20s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 09:21:55,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3891.31 | bwd_microstep: 5419.01 | bwd_inner_microstep: 5391.64 | bwd_allreduce_microstep: 27.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 09:22:04,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.49 | bwd_microstep: 5374.79 | bwd_inner_microstep: 5274.51 | bwd_allreduce_microstep: 100.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 09:22:12,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.03 | bwd_microstep: 4872.47 | bwd_inner_microstep: 4823.74 | bwd_allreduce_microstep: 48.67 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 09:22:20,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.97 | bwd_microstep: 4733.03 | bwd_inner_microstep: 4707.05 | bwd_allreduce_microstep: 25.91 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3610 [2024-07-31 09:22:29,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.70 | bwd_microstep: 5190.30 | bwd_inner_microstep: 5123.63 | bwd_allreduce_microstep: 66.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-31 09:22:38,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.59 | bwd_microstep: 5059.49 | bwd_inner_microstep: 4993.37 | bwd_allreduce_microstep: 66.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3680 [2024-07-31 09:22:46,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.65 | bwd_microstep: 5073.71 | bwd_inner_microstep: 4990.58 | bwd_allreduce_microstep: 83.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 09:22:54,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 09:22:54,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3211.65 | bwd_microstep: 4735.17 | bwd_inner_microstep: 4708.20 | bwd_allreduce_microstep: 26.90 | step_microstep: 183.43 [2024-07-31 09:22:54,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28043.29 | bwd: 40457.96 | bwd_inner: 40012.67 | bwd_allreduce: 444.81 | step: 184.03 30%|██▉ | 367/1230 [7:11:00<16:43:52, 69.79s/it] {'loss': 1.211, 'learning_rate': 1.6455850640046134e-05, 'epoch': 0.3} 30%|██▉ | 367/1230 [7:11:00<16:43:52, 69.79s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3864 [2024-07-31 09:23:04,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.77 | bwd_microstep: 5550.92 | bwd_inner_microstep: 5452.65 | bwd_allreduce_microstep: 98.21 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-31 09:23:12,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.36 | bwd_microstep: 5211.21 | bwd_inner_microstep: 5150.42 | bwd_allreduce_microstep: 60.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-31 09:23:21,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.82 | bwd_microstep: 5153.43 | bwd_inner_microstep: 5077.60 | bwd_allreduce_microstep: 75.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 09:23:30,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.61 | bwd_microstep: 5191.60 | bwd_inner_microstep: 5132.78 | bwd_allreduce_microstep: 58.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 09:23:39,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.25 | bwd_microstep: 5166.37 | bwd_inner_microstep: 5108.83 | bwd_allreduce_microstep: 57.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-31 09:23:48,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.48 | bwd_microstep: 5074.96 | bwd_inner_microstep: 5040.84 | bwd_allreduce_microstep: 34.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-31 09:23:56,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.22 | bwd_microstep: 4914.95 | bwd_inner_microstep: 4890.79 | bwd_allreduce_microstep: 24.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 09:24:05,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.86 [2024-07-31 09:24:05,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.11 | bwd_microstep: 4925.93 | bwd_inner_microstep: 4900.14 | bwd_allreduce_microstep: 25.72 | step_microstep: 181.68 [2024-07-31 09:24:05,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29322.53 | bwd: 41189.34 | bwd_inner: 40753.98 | bwd_allreduce: 434.87 | step: 182.38 30%|██▉ | 368/1230 [7:12:11<16:47:15, 70.11s/it] {'loss': 1.1654, 'learning_rate': 1.6435717652323097e-05, 'epoch': 0.3} 30%|██▉ | 368/1230 [7:12:11<16:47:15, 70.11s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4043 [2024-07-31 09:24:14,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.87 | bwd_microstep: 5432.01 | bwd_inner_microstep: 5389.09 | bwd_allreduce_microstep: 42.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3836 [2024-07-31 09:24:23,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.43 | bwd_microstep: 5302.68 | bwd_inner_microstep: 5234.80 | bwd_allreduce_microstep: 67.80 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3733 [2024-07-31 09:24:32,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.64 | bwd_microstep: 4984.92 | bwd_inner_microstep: 4938.97 | bwd_allreduce_microstep: 45.88 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2087 [2024-07-31 09:24:41,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.70 | bwd_microstep: 5249.53 | bwd_inner_microstep: 4844.13 | bwd_allreduce_microstep: 405.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-31 09:24:49,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.20 | bwd_microstep: 4980.66 | bwd_inner_microstep: 4961.29 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 09:24:58,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.05 | bwd_microstep: 5171.72 | bwd_inner_microstep: 5117.35 | bwd_allreduce_microstep: 54.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 09:25:07,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.36 | bwd_microstep: 4979.92 | bwd_inner_microstep: 4960.54 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 09:25:16,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 09:25:16,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.53 | bwd_microstep: 5094.07 | bwd_inner_microstep: 4697.02 | bwd_allreduce_microstep: 396.98 | step_microstep: 181.68 [2024-07-31 09:25:16,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29031.70 | bwd: 41195.50 | bwd_inner: 40143.12 | bwd_allreduce: 1051.87 | step: 182.28 30%|███ | 369/1230 [7:13:22<16:48:01, 70.25s/it] {'loss': 1.2022, 'learning_rate': 1.6415540035761008e-05, 'epoch': 0.3} 30%|███ | 369/1230 [7:13:22<16:48:01, 70.25s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3967 [2024-07-31 09:25:25,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3826.86 | bwd_microstep: 5265.08 | bwd_inner_microstep: 5231.36 | bwd_allreduce_microstep: 33.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3818 [2024-07-31 09:25:34,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3815.37 | bwd_microstep: 5128.84 | bwd_inner_microstep: 5099.70 | bwd_allreduce_microstep: 29.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 09:25:42,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.11 | bwd_microstep: 5089.46 | bwd_inner_microstep: 5022.15 | bwd_allreduce_microstep: 67.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3757 [2024-07-31 09:25:51,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.60 | bwd_microstep: 4997.21 | bwd_inner_microstep: 4977.89 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-31 09:26:00,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.65 | bwd_microstep: 5147.30 | bwd_inner_microstep: 4747.50 | bwd_allreduce_microstep: 399.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 09:26:08,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.99 | bwd_microstep: 5007.08 | bwd_inner_microstep: 4957.48 | bwd_allreduce_microstep: 49.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-31 09:26:17,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.68 | bwd_microstep: 5004.93 | bwd_inner_microstep: 4985.49 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 09:26:26,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-31 09:26:26,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.17 | bwd_microstep: 5043.25 | bwd_inner_microstep: 4988.02 | bwd_allreduce_microstep: 55.17 | step_microstep: 181.44 [2024-07-31 09:26:26,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29277.34 | bwd: 40683.14 | bwd_inner: 40009.54 | bwd_allreduce: 673.11 | step: 182.01 30%|███ | 370/1230 [7:14:32<16:47:03, 70.26s/it] {'loss': 1.1947, 'learning_rate': 1.639531793028265e-05, 'epoch': 0.3} 30%|███ | 370/1230 [7:14:32<16:47:03, 70.26s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3981 [2024-07-31 09:26:35,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3871.38 | bwd_microstep: 5245.40 | bwd_inner_microstep: 5226.33 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3789 [2024-07-31 09:26:44,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.66 | bwd_microstep: 5038.55 | bwd_inner_microstep: 5019.20 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3778 [2024-07-31 09:26:53,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.47 | bwd_microstep: 5074.51 | bwd_inner_microstep: 5050.63 | bwd_allreduce_microstep: 23.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 09:27:01,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3238.02 | bwd_microstep: 4837.53 | bwd_inner_microstep: 4788.56 | bwd_allreduce_microstep: 48.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-31 09:27:10,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.52 | bwd_microstep: 5191.54 | bwd_inner_microstep: 4787.74 | bwd_allreduce_microstep: 403.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3708 [2024-07-31 09:27:19,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.72 | bwd_microstep: 5181.54 | bwd_inner_microstep: 5082.68 | bwd_allreduce_microstep: 98.79 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2111 [2024-07-31 09:27:27,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3035.93 | bwd_microstep: 4992.30 | bwd_inner_microstep: 4607.87 | bwd_allreduce_microstep: 384.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 09:27:35,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 09:27:35,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.50 | bwd_microstep: 5024.71 | bwd_inner_microstep: 4967.70 | bwd_allreduce_microstep: 56.94 | step_microstep: 181.76 [2024-07-31 09:27:35,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28398.12 | bwd: 40586.05 | bwd_inner: 39530.63 | bwd_allreduce: 1054.92 | step: 182.34 30%|███ | 371/1230 [7:15:41<16:41:50, 69.98s/it] {'loss': 1.1684, 'learning_rate': 1.637505147611934e-05, 'epoch': 0.3} 30%|███ | 371/1230 [7:15:41<16:41:50, 69.98s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3795 [2024-07-31 09:27:45,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.46 | bwd_microstep: 5455.90 | bwd_inner_microstep: 5370.11 | bwd_allreduce_microstep: 85.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-31 09:27:53,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.10 | bwd_microstep: 5031.92 | bwd_inner_microstep: 5012.54 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 09:28:02,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.11 | bwd_microstep: 5210.68 | bwd_inner_microstep: 5129.97 | bwd_allreduce_microstep: 80.65 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1166 [2024-07-31 09:28:11,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.45 | bwd_microstep: 5242.24 | bwd_inner_microstep: 4836.34 | bwd_allreduce_microstep: 405.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 09:28:20,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.92 | bwd_microstep: 5211.41 | bwd_inner_microstep: 4804.51 | bwd_allreduce_microstep: 406.82 | step_microstep: 0.08 dynamic ViT batch size: 13, images per sample: 6.5, dynamic token length: 2983 [2024-07-31 09:28:28,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.31 | bwd_microstep: 5029.09 | bwd_inner_microstep: 4661.83 | bwd_allreduce_microstep: 367.19 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2935 [2024-07-31 09:28:37,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.60 | bwd_microstep: 5029.27 | bwd_inner_microstep: 4686.65 | bwd_allreduce_microstep: 342.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 09:28:46,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 09:28:46,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.28 | bwd_microstep: 5062.98 | bwd_inner_microstep: 4671.43 | bwd_allreduce_microstep: 391.48 | step_microstep: 181.45 [2024-07-31 09:28:46,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28571.15 | bwd: 41273.49 | bwd_inner: 39173.32 | bwd_allreduce: 2099.67 | step: 182.04 30%|███ | 372/1230 [7:16:51<16:41:30, 70.04s/it] {'loss': 1.1719, 'learning_rate': 1.6354740813809917e-05, 'epoch': 0.3} 30%|███ | 372/1230 [7:16:51<16:41:30, 70.04s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2456 [2024-07-31 09:28:55,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.91 | bwd_microstep: 5378.23 | bwd_inner_microstep: 4962.28 | bwd_allreduce_microstep: 415.88 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3801 [2024-07-31 09:29:04,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.28 | bwd_microstep: 5437.73 | bwd_inner_microstep: 5350.60 | bwd_allreduce_microstep: 87.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2240 [2024-07-31 09:29:13,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.62 | bwd_microstep: 5262.91 | bwd_inner_microstep: 4855.10 | bwd_allreduce_microstep: 407.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 09:29:21,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.51 | bwd_microstep: 5180.21 | bwd_inner_microstep: 5098.92 | bwd_allreduce_microstep: 81.22 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3644 [2024-07-31 09:29:30,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.16 | bwd_microstep: 5136.52 | bwd_inner_microstep: 5055.71 | bwd_allreduce_microstep: 80.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3644 [2024-07-31 09:29:39,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.96 | bwd_microstep: 4834.20 | bwd_inner_microstep: 4814.79 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 09:29:47,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.38 | bwd_microstep: 5054.68 | bwd_inner_microstep: 4661.15 | bwd_allreduce_microstep: 393.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 09:29:55,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 09:29:55,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3207.67 | bwd_microstep: 4682.51 | bwd_inner_microstep: 4659.66 | bwd_allreduce_microstep: 22.78 | step_microstep: 182.11 [2024-07-31 09:29:55,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28454.38 | bwd: 40966.96 | bwd_inner: 39458.15 | bwd_allreduce: 1508.33 | step: 182.69 30%|███ | 373/1230 [7:18:01<16:39:08, 69.95s/it] {'loss': 1.1496, 'learning_rate': 1.6334386084199787e-05, 'epoch': 0.3} 30%|███ | 373/1230 [7:18:01<16:39:08, 69.95s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3818 [2024-07-31 09:30:04,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3801.18 | bwd_microstep: 5148.45 | bwd_inner_microstep: 5115.82 | bwd_allreduce_microstep: 32.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 09:30:13,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.93 | bwd_microstep: 4993.98 | bwd_inner_microstep: 4974.53 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3808 [2024-07-31 09:30:21,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3438.44 | bwd_microstep: 5009.41 | bwd_inner_microstep: 4979.44 | bwd_allreduce_microstep: 29.91 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-31 09:30:30,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.47 | bwd_microstep: 5001.88 | bwd_inner_microstep: 4982.53 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2104 [2024-07-31 09:30:39,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.44 | bwd_microstep: 5235.58 | bwd_inner_microstep: 4829.98 | bwd_allreduce_microstep: 405.53 | step_microstep: 0.21 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 09:30:47,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3176.34 | bwd_microstep: 4793.62 | bwd_inner_microstep: 4759.34 | bwd_allreduce_microstep: 34.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3743 [2024-07-31 09:30:55,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3102.92 | bwd_microstep: 4776.15 | bwd_inner_microstep: 4749.11 | bwd_allreduce_microstep: 26.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-31 09:31:04,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 09:31:04,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.89 | bwd_microstep: 4982.19 | bwd_inner_microstep: 4933.43 | bwd_allreduce_microstep: 48.69 | step_microstep: 181.82 [2024-07-31 09:31:04,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28125.51 | bwd: 39941.23 | bwd_inner: 39324.10 | bwd_allreduce: 616.64 | step: 182.51 30%|███ | 374/1230 [7:19:10<16:31:19, 69.49s/it] {'loss': 1.2575, 'learning_rate': 1.631398742843995e-05, 'epoch': 0.3} 30%|███ | 374/1230 [7:19:10<16:31:19, 69.49s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4010 [2024-07-31 09:31:13,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3881.65 | bwd_microstep: 5371.46 | bwd_inner_microstep: 5344.24 | bwd_allreduce_microstep: 27.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3582 [2024-07-31 09:31:22,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.14 | bwd_microstep: 5164.85 | bwd_inner_microstep: 5082.16 | bwd_allreduce_microstep: 82.62 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2062 [2024-07-31 09:31:30,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.36 | bwd_microstep: 5205.21 | bwd_inner_microstep: 4802.95 | bwd_allreduce_microstep: 402.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-07-31 09:31:39,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.52 | bwd_microstep: 5171.51 | bwd_inner_microstep: 5116.56 | bwd_allreduce_microstep: 54.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 09:31:48,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.58 | bwd_microstep: 5003.57 | bwd_inner_microstep: 4984.23 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 09:31:57,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.86 | bwd_microstep: 5183.72 | bwd_inner_microstep: 5103.25 | bwd_allreduce_microstep: 80.41 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2129 [2024-07-31 09:32:05,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.34 | bwd_microstep: 5085.82 | bwd_inner_microstep: 4694.36 | bwd_allreduce_microstep: 391.39 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2143 [2024-07-31 09:32:14,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 09:32:14,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3451.77 | bwd_microstep: 5045.46 | bwd_inner_microstep: 4651.12 | bwd_allreduce_microstep: 394.27 | step_microstep: 181.46 [2024-07-31 09:32:14,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28879.11 | bwd: 41231.56 | bwd_inner: 39778.80 | bwd_allreduce: 1452.27 | step: 182.05 30%|███ | 375/1230 [7:20:20<16:34:16, 69.77s/it] {'loss': 1.1629, 'learning_rate': 1.629354498798601e-05, 'epoch': 0.3} 30%|███ | 375/1230 [7:20:20<16:34:16, 69.77s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3836 [2024-07-31 09:32:23,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.44 | bwd_microstep: 5278.82 | bwd_inner_microstep: 5217.43 | bwd_allreduce_microstep: 61.32 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3566 [2024-07-31 09:32:32,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.09 | bwd_microstep: 5163.59 | bwd_inner_microstep: 5072.01 | bwd_allreduce_microstep: 91.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3766 [2024-07-31 09:32:41,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.12 | bwd_microstep: 5009.12 | bwd_inner_microstep: 4986.96 | bwd_allreduce_microstep: 22.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-31 09:32:49,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.86 | bwd_microstep: 5172.06 | bwd_inner_microstep: 5094.84 | bwd_allreduce_microstep: 77.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-31 09:32:58,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.84 | bwd_microstep: 5179.05 | bwd_inner_microstep: 4776.12 | bwd_allreduce_microstep: 402.86 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2162 [2024-07-31 09:33:07,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.88 | bwd_microstep: 5132.72 | bwd_inner_microstep: 4732.87 | bwd_allreduce_microstep: 399.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 09:33:15,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3185.78 | bwd_microstep: 4717.31 | bwd_inner_microstep: 4693.77 | bwd_allreduce_microstep: 23.47 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2138 [2024-07-31 09:33:23,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 09:33:23,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3037.71 | bwd_microstep: 5014.29 | bwd_inner_microstep: 4629.29 | bwd_allreduce_microstep: 384.93 | step_microstep: 181.49 [2024-07-31 09:33:23,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27906.63 | bwd: 40666.94 | bwd_inner: 39203.22 | bwd_allreduce: 1463.23 | step: 182.08 31%|███ | 376/1230 [7:21:29<16:29:23, 69.51s/it] {'loss': 1.1316, 'learning_rate': 1.627305890459719e-05, 'epoch': 0.31} 31%|███ | 376/1230 [7:21:29<16:29:23, 69.51s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 09:33:32,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3862.34 | bwd_microstep: 5399.34 | bwd_inner_microstep: 5380.33 | bwd_allreduce_microstep: 18.94 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2162 [2024-07-31 09:33:41,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.70 | bwd_microstep: 5332.62 | bwd_inner_microstep: 4918.27 | bwd_allreduce_microstep: 414.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-31 09:33:50,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.24 | bwd_microstep: 5065.19 | bwd_inner_microstep: 5037.94 | bwd_allreduce_microstep: 27.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-31 09:33:59,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.77 | bwd_microstep: 5122.82 | bwd_inner_microstep: 5077.03 | bwd_allreduce_microstep: 45.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 09:34:08,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.98 | bwd_microstep: 5162.20 | bwd_inner_microstep: 5106.03 | bwd_allreduce_microstep: 56.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 09:34:16,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.82 | bwd_microstep: 5057.17 | bwd_inner_microstep: 4996.56 | bwd_allreduce_microstep: 60.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3667 [2024-07-31 09:34:25,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.55 | bwd_microstep: 4876.05 | bwd_inner_microstep: 4856.66 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2139 [2024-07-31 09:34:33,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 09:34:33,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3040.57 | bwd_microstep: 4965.52 | bwd_inner_microstep: 4581.27 | bwd_allreduce_microstep: 384.19 | step_microstep: 181.61 [2024-07-31 09:34:33,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28728.88 | bwd: 40980.90 | bwd_inner: 39954.03 | bwd_allreduce: 1026.39 | step: 182.19 31%|███ | 377/1230 [7:22:39<16:30:28, 69.67s/it] {'loss': 1.1437, 'learning_rate': 1.625252932033538e-05, 'epoch': 0.31} 31%|███ | 377/1230 [7:22:39<16:30:28, 69.67s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4036 [2024-07-31 09:34:42,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3872.08 | bwd_microstep: 5383.03 | bwd_inner_microstep: 5355.12 | bwd_allreduce_microstep: 27.84 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3755 [2024-07-31 09:34:51,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.66 | bwd_microstep: 5147.08 | bwd_inner_microstep: 5077.99 | bwd_allreduce_microstep: 69.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-31 09:35:00,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.91 | bwd_microstep: 5008.50 | bwd_inner_microstep: 4985.43 | bwd_allreduce_microstep: 23.00 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3744 [2024-07-31 09:35:09,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.99 | bwd_microstep: 5197.46 | bwd_inner_microstep: 5127.83 | bwd_allreduce_microstep: 69.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2226 [2024-07-31 09:35:17,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.95 | bwd_microstep: 5111.53 | bwd_inner_microstep: 4713.60 | bwd_allreduce_microstep: 397.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 09:35:26,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.45 | bwd_microstep: 4983.16 | bwd_inner_microstep: 4963.75 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-07-31 09:35:35,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.06 | bwd_microstep: 5012.62 | bwd_inner_microstep: 4993.27 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 09:35:44,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 09:35:44,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.86 | bwd_microstep: 5134.62 | bwd_inner_microstep: 5066.40 | bwd_allreduce_microstep: 68.15 | step_microstep: 181.42 [2024-07-31 09:35:44,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29439.86 | bwd: 40977.97 | bwd_inner: 40283.32 | bwd_allreduce: 694.16 | step: 182.00 31%|███ | 378/1230 [7:23:50<16:33:56, 70.00s/it] {'loss': 1.2018, 'learning_rate': 1.6231956377564095e-05, 'epoch': 0.31} 31%|███ | 378/1230 [7:23:50<16:33:56, 70.00s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 09:35:52,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3379.29 | bwd_microstep: 5165.80 | bwd_inner_microstep: 5146.69 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3824 [2024-07-31 09:36:01,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.17 | bwd_microstep: 5194.89 | bwd_inner_microstep: 5141.73 | bwd_allreduce_microstep: 53.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-31 09:36:10,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.47 | bwd_microstep: 5138.77 | bwd_inner_microstep: 5057.23 | bwd_allreduce_microstep: 81.47 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 09:36:18,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3240.49 | bwd_microstep: 4826.94 | bwd_inner_microstep: 4803.26 | bwd_allreduce_microstep: 23.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-31 09:36:27,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.75 | bwd_microstep: 5020.93 | bwd_inner_microstep: 4966.45 | bwd_allreduce_microstep: 54.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3655 [2024-07-31 09:36:35,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.27 | bwd_microstep: 4936.35 | bwd_inner_microstep: 4907.64 | bwd_allreduce_microstep: 28.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 09:36:44,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.93 | bwd_microstep: 5087.77 | bwd_inner_microstep: 5023.69 | bwd_allreduce_microstep: 64.02 | step_microstep: 0.09 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1123 [2024-07-31 09:36:53,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 09:36:53,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3443.40 | bwd_microstep: 5086.56 | bwd_inner_microstep: 4694.26 | bwd_allreduce_microstep: 392.23 | step_microstep: 182.25 [2024-07-31 09:36:53,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28206.67 | bwd: 40458.01 | bwd_inner: 39740.88 | bwd_allreduce: 716.64 | step: 182.85 31%|███ | 379/1230 [7:24:59<16:28:31, 69.70s/it] {'loss': 1.2288, 'learning_rate': 1.621134021894756e-05, 'epoch': 0.31} 31%|███ | 379/1230 [7:24:59<16:28:31, 69.70s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 09:37:02,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.62 | bwd_microstep: 5508.84 | bwd_inner_microstep: 5391.89 | bwd_allreduce_microstep: 116.89 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2227 [2024-07-31 09:37:11,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.41 | bwd_microstep: 5166.49 | bwd_inner_microstep: 4763.30 | bwd_allreduce_microstep: 403.13 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3593 [2024-07-31 09:37:19,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3130.88 | bwd_microstep: 4979.41 | bwd_inner_microstep: 4917.77 | bwd_allreduce_microstep: 61.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 09:37:28,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.40 | bwd_microstep: 4993.44 | bwd_inner_microstep: 4972.14 | bwd_allreduce_microstep: 21.23 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3630 [2024-07-31 09:37:36,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.34 | bwd_microstep: 5044.56 | bwd_inner_microstep: 4971.62 | bwd_allreduce_microstep: 72.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 09:37:45,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.90 | bwd_microstep: 5248.22 | bwd_inner_microstep: 4841.20 | bwd_allreduce_microstep: 406.94 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 09:37:54,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.39 | bwd_microstep: 5164.74 | bwd_inner_microstep: 5084.36 | bwd_allreduce_microstep: 80.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-31 09:38:03,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 09:38:03,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.24 | bwd_microstep: 5152.73 | bwd_inner_microstep: 4751.94 | bwd_allreduce_microstep: 400.72 | step_microstep: 181.96 [2024-07-31 09:38:03,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28422.08 | bwd: 41258.42 | bwd_inner: 39694.15 | bwd_allreduce: 1563.77 | step: 182.66 31%|███ | 380/1230 [7:26:09<16:28:42, 69.79s/it] {'loss': 1.1403, 'learning_rate': 1.619068098744965e-05, 'epoch': 0.31} 31%|███ | 380/1230 [7:26:09<16:28:42, 69.79s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 09:38:12,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3859.99 | bwd_microstep: 5403.63 | bwd_inner_microstep: 5384.62 | bwd_allreduce_microstep: 18.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3863 [2024-07-31 09:38:21,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3789.11 | bwd_microstep: 5123.30 | bwd_inner_microstep: 5103.92 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-31 09:38:30,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3793.21 | bwd_microstep: 5189.96 | bwd_inner_microstep: 5151.28 | bwd_allreduce_microstep: 38.62 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1240 [2024-07-31 09:38:38,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3238.71 | bwd_microstep: 5121.51 | bwd_inner_microstep: 4727.35 | bwd_allreduce_microstep: 394.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3772 [2024-07-31 09:38:47,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.73 | bwd_microstep: 5053.83 | bwd_inner_microstep: 5030.06 | bwd_allreduce_microstep: 23.71 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3641 [2024-07-31 09:38:55,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3139.62 | bwd_microstep: 4943.60 | bwd_inner_microstep: 4892.83 | bwd_allreduce_microstep: 50.70 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2135 [2024-07-31 09:39:04,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.15 | bwd_microstep: 5108.79 | bwd_inner_microstep: 4711.86 | bwd_allreduce_microstep: 396.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 09:39:13,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.82 [2024-07-31 09:39:13,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.74 | bwd_microstep: 4913.14 | bwd_inner_microstep: 4889.69 | bwd_allreduce_microstep: 23.38 | step_microstep: 181.89 [2024-07-31 09:39:13,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28787.17 | bwd: 40857.73 | bwd_inner: 39891.55 | bwd_allreduce: 965.69 | step: 182.47 31%|███ | 381/1230 [7:27:19<16:28:19, 69.85s/it] {'loss': 1.1945, 'learning_rate': 1.6169978826332955e-05, 'epoch': 0.31} 31%|███ | 381/1230 [7:27:19<16:28:19, 69.85s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2266 [2024-07-31 09:39:22,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.42 | bwd_microstep: 5256.17 | bwd_inner_microstep: 4852.26 | bwd_allreduce_microstep: 403.85 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3821 [2024-07-31 09:39:31,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.86 | bwd_microstep: 5252.01 | bwd_inner_microstep: 5177.67 | bwd_allreduce_microstep: 74.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3860 [2024-07-31 09:39:39,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.41 | bwd_microstep: 5109.24 | bwd_inner_microstep: 5089.94 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-31 09:39:48,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.33 | bwd_microstep: 5008.69 | bwd_inner_microstep: 4986.21 | bwd_allreduce_microstep: 22.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-31 09:39:57,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.04 | bwd_microstep: 5143.77 | bwd_inner_microstep: 5091.94 | bwd_allreduce_microstep: 51.77 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3741 [2024-07-31 09:40:06,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.51 | bwd_microstep: 5272.51 | bwd_inner_microstep: 5190.90 | bwd_allreduce_microstep: 81.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-07-31 09:40:15,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.27 | bwd_microstep: 5176.35 | bwd_inner_microstep: 5128.13 | bwd_allreduce_microstep: 48.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 09:40:24,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 09:40:24,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.93 | bwd_microstep: 5027.64 | bwd_inner_microstep: 5004.01 | bwd_allreduce_microstep: 23.56 | step_microstep: 182.32 [2024-07-31 09:40:24,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29277.68 | bwd: 41246.35 | bwd_inner: 40520.99 | bwd_allreduce: 724.87 | step: 182.90 31%|███ | 382/1230 [7:28:30<16:31:27, 70.15s/it] {'loss': 1.1712, 'learning_rate': 1.6149233879157747e-05, 'epoch': 0.31} 31%|███ | 382/1230 [7:28:30<16:31:27, 70.15s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 09:40:33,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3866.84 | bwd_microstep: 5382.08 | bwd_inner_microstep: 5363.00 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-31 09:40:41,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.30 | bwd_microstep: 4916.92 | bwd_inner_microstep: 4863.81 | bwd_allreduce_microstep: 53.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3961 [2024-07-31 09:40:50,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.30 | bwd_microstep: 5096.14 | bwd_inner_microstep: 5064.94 | bwd_allreduce_microstep: 31.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 09:40:58,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3344.50 | bwd_microstep: 5044.14 | bwd_inner_microstep: 4990.65 | bwd_allreduce_microstep: 53.42 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-31 09:41:07,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.94 | bwd_microstep: 5216.82 | bwd_inner_microstep: 4810.70 | bwd_allreduce_microstep: 406.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 09:41:16,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.95 | bwd_microstep: 5084.12 | bwd_inner_microstep: 5018.59 | bwd_allreduce_microstep: 65.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-31 09:41:24,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.31 | bwd_microstep: 4886.32 | bwd_inner_microstep: 4866.93 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1110 [2024-07-31 09:41:33,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 09:41:33,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2981.84 | bwd_microstep: 4965.56 | bwd_inner_microstep: 4587.28 | bwd_allreduce_microstep: 378.21 | step_microstep: 183.47 [2024-07-31 09:41:33,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27931.89 | bwd: 40592.08 | bwd_inner: 39565.82 | bwd_allreduce: 1025.77 | step: 184.06 31%|███ | 383/1230 [7:29:38<16:24:47, 69.76s/it] {'loss': 1.1651, 'learning_rate': 1.6128446289781012e-05, 'epoch': 0.31} 31%|███ | 383/1230 [7:29:38<16:24:47, 69.76s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 09:41:42,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.40 | bwd_microstep: 5368.50 | bwd_inner_microstep: 5271.44 | bwd_allreduce_microstep: 96.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3782 [2024-07-31 09:41:51,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3783.21 | bwd_microstep: 5188.68 | bwd_inner_microstep: 5150.92 | bwd_allreduce_microstep: 37.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3818 [2024-07-31 09:41:59,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.45 | bwd_microstep: 5056.46 | bwd_inner_microstep: 5031.65 | bwd_allreduce_microstep: 24.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 09:42:08,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3238.23 | bwd_microstep: 4909.15 | bwd_inner_microstep: 4859.14 | bwd_allreduce_microstep: 49.94 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 672 [2024-07-31 09:42:16,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2959.88 | bwd_microstep: 5017.52 | bwd_inner_microstep: 4633.99 | bwd_allreduce_microstep: 383.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 09:42:24,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.38 | bwd_microstep: 5141.33 | bwd_inner_microstep: 5068.53 | bwd_allreduce_microstep: 72.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3677 [2024-07-31 09:42:33,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.58 | bwd_microstep: 4952.52 | bwd_inner_microstep: 4926.04 | bwd_allreduce_microstep: 26.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-31 09:42:42,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 09:42:42,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.63 | bwd_microstep: 5081.53 | bwd_inner_microstep: 4690.25 | bwd_allreduce_microstep: 391.21 | step_microstep: 182.60 [2024-07-31 09:42:42,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28261.66 | bwd: 40715.67 | bwd_inner: 39631.91 | bwd_allreduce: 1083.28 | step: 183.18 31%|███ | 384/1230 [7:30:48<16:21:43, 69.63s/it] {'loss': 1.2365, 'learning_rate': 1.610761620235543e-05, 'epoch': 0.31} 31%|███ | 384/1230 [7:30:48<16:21:43, 69.63s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3974 [2024-07-31 09:42:51,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.71 | bwd_microstep: 5381.01 | bwd_inner_microstep: 5313.67 | bwd_allreduce_microstep: 67.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3882 [2024-07-31 09:43:00,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.95 | bwd_microstep: 5074.47 | bwd_inner_microstep: 5038.75 | bwd_allreduce_microstep: 35.65 | step_microstep: 0.11 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3797 [2024-07-31 09:43:09,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.03 | bwd_microstep: 5081.25 | bwd_inner_microstep: 5052.31 | bwd_allreduce_microstep: 28.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-31 09:43:17,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3035.41 | bwd_microstep: 4995.88 | bwd_inner_microstep: 4610.34 | bwd_allreduce_microstep: 385.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-31 09:43:25,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.35 | bwd_microstep: 5016.01 | bwd_inner_microstep: 4994.47 | bwd_allreduce_microstep: 21.47 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2109 [2024-07-31 09:43:34,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.13 | bwd_microstep: 5180.84 | bwd_inner_microstep: 4778.24 | bwd_allreduce_microstep: 402.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 09:43:43,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3441.81 | bwd_microstep: 5020.08 | bwd_inner_microstep: 4631.62 | bwd_allreduce_microstep: 388.40 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2119 [2024-07-31 09:43:51,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 09:43:51,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3441.70 | bwd_microstep: 5031.44 | bwd_inner_microstep: 4643.91 | bwd_allreduce_microstep: 387.47 | step_microstep: 181.48 [2024-07-31 09:43:51,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28275.99 | bwd: 40780.97 | bwd_inner: 39063.23 | bwd_allreduce: 1717.24 | step: 182.11 31%|███▏ | 385/1230 [7:31:57<16:19:33, 69.55s/it] {'loss': 1.2245, 'learning_rate': 1.60867437613284e-05, 'epoch': 0.31} 31%|███▏ | 385/1230 [7:31:57<16:19:33, 69.55s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4068 [2024-07-31 09:44:01,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3915.54 | bwd_microstep: 5454.98 | bwd_inner_microstep: 5420.51 | bwd_allreduce_microstep: 34.40 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-31 09:44:09,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.08 | bwd_microstep: 5213.65 | bwd_inner_microstep: 4806.31 | bwd_allreduce_microstep: 407.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-31 09:44:18,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.62 | bwd_microstep: 5192.74 | bwd_inner_microstep: 5109.72 | bwd_allreduce_microstep: 82.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-31 09:44:27,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.20 | bwd_microstep: 5164.31 | bwd_inner_microstep: 5083.72 | bwd_allreduce_microstep: 80.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 09:44:36,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.65 | bwd_microstep: 4919.90 | bwd_inner_microstep: 4896.83 | bwd_allreduce_microstep: 23.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-31 09:44:44,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.53 | bwd_microstep: 5024.70 | bwd_inner_microstep: 4964.59 | bwd_allreduce_microstep: 60.04 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2128 [2024-07-31 09:44:52,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3269.76 | bwd_microstep: 5021.25 | bwd_inner_microstep: 4634.35 | bwd_allreduce_microstep: 386.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 09:45:01,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 09:45:01,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.11 | bwd_microstep: 4909.61 | bwd_inner_microstep: 4888.50 | bwd_allreduce_microstep: 21.04 | step_microstep: 182.97 [2024-07-31 09:45:01,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28881.41 | bwd: 40901.11 | bwd_inner: 39804.47 | bwd_allreduce: 1096.15 | step: 183.56 31%|███▏ | 386/1230 [7:33:07<16:20:45, 69.72s/it] {'loss': 1.1262, 'learning_rate': 1.6065829111441e-05, 'epoch': 0.31} 31%|███▏ | 386/1230 [7:33:07<16:20:45, 69.72s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4029 [2024-07-31 09:45:11,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.00 | bwd_microstep: 5519.29 | bwd_inner_microstep: 5454.44 | bwd_allreduce_microstep: 64.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3786 [2024-07-31 09:45:19,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.43 | bwd_microstep: 5021.61 | bwd_inner_microstep: 5001.72 | bwd_allreduce_microstep: 19.83 | step_microstep: 0.19 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2066 [2024-07-31 09:45:28,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.74 | bwd_microstep: 5207.05 | bwd_inner_microstep: 4802.39 | bwd_allreduce_microstep: 404.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-31 09:45:37,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.17 | bwd_microstep: 4996.68 | bwd_inner_microstep: 4977.28 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-31 09:45:46,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.54 | bwd_microstep: 4981.00 | bwd_inner_microstep: 4961.65 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 09:45:54,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.86 | bwd_microstep: 4999.68 | bwd_inner_microstep: 4976.32 | bwd_allreduce_microstep: 23.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 09:46:03,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.86 | bwd_microstep: 5096.71 | bwd_inner_microstep: 4702.09 | bwd_allreduce_microstep: 394.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 09:46:11,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 09:46:11,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.30 | bwd_microstep: 4763.01 | bwd_inner_microstep: 4737.08 | bwd_allreduce_microstep: 25.86 | step_microstep: 181.38 [2024-07-31 09:46:11,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28938.82 | bwd: 40585.02 | bwd_inner: 39612.91 | bwd_allreduce: 971.63 | step: 182.08 31%|███▏ | 387/1230 [7:34:17<16:20:10, 69.76s/it] {'loss': 1.1627, 'learning_rate': 1.6044872397727037e-05, 'epoch': 0.31} 31%|███▏ | 387/1230 [7:34:17<16:20:10, 69.76s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3896 [2024-07-31 09:46:20,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.93 | bwd_microstep: 5178.27 | bwd_inner_microstep: 5133.53 | bwd_allreduce_microstep: 44.67 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3756 [2024-07-31 09:46:28,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.80 | bwd_microstep: 4833.55 | bwd_inner_microstep: 4809.99 | bwd_allreduce_microstep: 23.49 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2050 [2024-07-31 09:46:37,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.97 | bwd_microstep: 5217.70 | bwd_inner_microstep: 4813.84 | bwd_allreduce_microstep: 403.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-31 09:46:46,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.51 | bwd_microstep: 5166.90 | bwd_inner_microstep: 4762.81 | bwd_allreduce_microstep: 404.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 09:46:54,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3474.92 | bwd_microstep: 5043.73 | bwd_inner_microstep: 4650.82 | bwd_allreduce_microstep: 392.84 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2110 [2024-07-31 09:47:03,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.61 | bwd_microstep: 5120.51 | bwd_inner_microstep: 4722.43 | bwd_allreduce_microstep: 398.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 09:47:11,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.91 | bwd_microstep: 4876.20 | bwd_inner_microstep: 4856.89 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 09:47:20,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.78 [2024-07-31 09:47:20,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.06 | bwd_microstep: 4998.27 | bwd_inner_microstep: 4944.44 | bwd_allreduce_microstep: 53.77 | step_microstep: 181.51 [2024-07-31 09:47:20,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28125.62 | bwd: 40435.11 | bwd_inner: 38694.68 | bwd_allreduce: 1739.94 | step: 182.11 32%|███▏ | 388/1230 [7:35:26<16:15:19, 69.50s/it] {'loss': 1.2391, 'learning_rate': 1.6023873765511993e-05, 'epoch': 0.32} 32%|███▏ | 388/1230 [7:35:26<16:15:19, 69.50s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4074 [2024-07-31 09:47:29,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.35 | bwd_microstep: 5146.73 | bwd_inner_microstep: 5127.67 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2279 [2024-07-31 09:47:37,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.14 | bwd_microstep: 5155.26 | bwd_inner_microstep: 4752.69 | bwd_allreduce_microstep: 402.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-31 09:47:46,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3360.95 | bwd_microstep: 4929.64 | bwd_inner_microstep: 4901.60 | bwd_allreduce_microstep: 27.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3718 [2024-07-31 09:47:55,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.42 | bwd_microstep: 5240.90 | bwd_inner_microstep: 5154.26 | bwd_allreduce_microstep: 86.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 09:48:03,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.71 | bwd_microstep: 5180.68 | bwd_inner_microstep: 5120.07 | bwd_allreduce_microstep: 60.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 09:48:12,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.44 | bwd_microstep: 5191.11 | bwd_inner_microstep: 5111.12 | bwd_allreduce_microstep: 79.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 09:48:21,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.24 | bwd_microstep: 4940.56 | bwd_inner_microstep: 4914.34 | bwd_allreduce_microstep: 26.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 09:48:30,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 09:48:30,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.03 | bwd_microstep: 5049.65 | bwd_inner_microstep: 4992.57 | bwd_allreduce_microstep: 57.02 | step_microstep: 181.56 [2024-07-31 09:48:30,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28571.17 | bwd: 40834.52 | bwd_inner: 40074.26 | bwd_allreduce: 759.78 | step: 182.13 32%|███▏ | 389/1230 [7:36:36<16:15:09, 69.57s/it] {'loss': 1.2052, 'learning_rate': 1.6002833360412044e-05, 'epoch': 0.32} 32%|███▏ | 389/1230 [7:36:36<16:15:09, 69.57s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3920 [2024-07-31 09:48:39,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3823.12 | bwd_microstep: 5169.62 | bwd_inner_microstep: 5150.57 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3765 [2024-07-31 09:48:48,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.76 | bwd_microstep: 5178.19 | bwd_inner_microstep: 5125.60 | bwd_allreduce_microstep: 52.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2246 [2024-07-31 09:48:56,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.40 | bwd_microstep: 5114.73 | bwd_inner_microstep: 4720.35 | bwd_allreduce_microstep: 394.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 09:49:05,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.57 | bwd_microstep: 4992.18 | bwd_inner_microstep: 4972.77 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 09:49:14,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.54 | bwd_microstep: 5018.94 | bwd_inner_microstep: 4997.15 | bwd_allreduce_microstep: 21.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 09:49:22,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.25 | bwd_microstep: 4926.22 | bwd_inner_microstep: 4902.57 | bwd_allreduce_microstep: 23.59 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2145 [2024-07-31 09:49:31,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.79 | bwd_microstep: 5100.67 | bwd_inner_microstep: 4704.96 | bwd_allreduce_microstep: 395.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3666 [2024-07-31 09:49:40,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 09:49:40,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.00 | bwd_microstep: 5031.42 | bwd_inner_microstep: 4949.91 | bwd_allreduce_microstep: 81.44 | step_microstep: 183.30 [2024-07-31 09:49:40,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29226.32 | bwd: 40531.96 | bwd_inner: 39523.81 | bwd_allreduce: 1007.66 | step: 183.88 32%|███▏ | 390/1230 [7:37:46<16:16:10, 69.73s/it] {'loss': 1.2199, 'learning_rate': 1.5981751328333036e-05, 'epoch': 0.32} 32%|███▏ | 390/1230 [7:37:46<16:16:10, 69.73s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3996 [2024-07-31 09:49:49,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.86 | bwd_microstep: 5327.48 | bwd_inner_microstep: 5288.13 | bwd_allreduce_microstep: 39.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2304 [2024-07-31 09:49:58,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.61 | bwd_microstep: 5257.70 | bwd_inner_microstep: 4849.45 | bwd_allreduce_microstep: 408.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3824 [2024-07-31 09:50:07,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.16 | bwd_microstep: 5079.56 | bwd_inner_microstep: 5053.97 | bwd_allreduce_microstep: 25.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3778 [2024-07-31 09:50:15,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.57 | bwd_microstep: 5030.28 | bwd_inner_microstep: 5010.98 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3832 [2024-07-31 09:50:24,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.82 | bwd_microstep: 5064.74 | bwd_inner_microstep: 5045.42 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-31 09:50:33,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.92 | bwd_microstep: 5001.36 | bwd_inner_microstep: 4981.67 | bwd_allreduce_microstep: 19.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 09:50:42,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.93 | bwd_microstep: 5055.81 | bwd_inner_microstep: 4995.85 | bwd_allreduce_microstep: 59.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-31 09:50:50,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 09:50:50,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3233.69 | bwd_microstep: 4813.99 | bwd_inner_microstep: 4791.87 | bwd_allreduce_microstep: 22.05 | step_microstep: 182.98 [2024-07-31 09:50:50,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29061.48 | bwd: 40630.90 | bwd_inner: 40017.28 | bwd_allreduce: 613.12 | step: 183.56 32%|███▏ | 391/1230 [7:38:56<16:16:16, 69.82s/it] {'loss': 1.1462, 'learning_rate': 1.5960627815469482e-05, 'epoch': 0.32} 32%|███▏ | 391/1230 [7:38:56<16:16:16, 69.82s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3920 [2024-07-31 09:50:59,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3817.91 | bwd_microstep: 5162.46 | bwd_inner_microstep: 5143.35 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3851 [2024-07-31 09:51:08,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.38 | bwd_microstep: 5107.11 | bwd_inner_microstep: 5087.67 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-07-31 09:51:17,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.78 | bwd_microstep: 5233.72 | bwd_inner_microstep: 5147.63 | bwd_allreduce_microstep: 86.01 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2220 [2024-07-31 09:51:26,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.55 | bwd_microstep: 5252.37 | bwd_inner_microstep: 4847.03 | bwd_allreduce_microstep: 405.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3776 [2024-07-31 09:51:34,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.58 | bwd_microstep: 5004.88 | bwd_inner_microstep: 4985.55 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-31 09:51:43,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.30 | bwd_microstep: 5160.31 | bwd_inner_microstep: 5090.19 | bwd_allreduce_microstep: 70.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 09:51:52,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.82 | bwd_microstep: 4903.53 | bwd_inner_microstep: 4884.22 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 09:52:01,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 09:52:01,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.53 | bwd_microstep: 4896.53 | bwd_inner_microstep: 4877.19 | bwd_allreduce_microstep: 19.28 | step_microstep: 181.69 [2024-07-31 09:52:01,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29533.75 | bwd: 40720.86 | bwd_inner: 40062.77 | bwd_allreduce: 657.60 | step: 182.30 32%|███▏ | 392/1230 [7:40:06<16:18:24, 70.05s/it] {'loss': 1.1428, 'learning_rate': 1.5939462968303554e-05, 'epoch': 0.32} 32%|███▏ | 392/1230 [7:40:06<16:18:24, 70.05s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2386 [2024-07-31 09:52:09,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3355.69 | bwd_microstep: 5198.26 | bwd_inner_microstep: 4802.23 | bwd_allreduce_microstep: 395.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 09:52:18,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.60 | bwd_microstep: 5245.62 | bwd_inner_microstep: 5159.84 | bwd_allreduce_microstep: 85.71 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-31 09:52:27,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.06 | bwd_microstep: 5177.33 | bwd_inner_microstep: 5097.02 | bwd_allreduce_microstep: 80.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 09:52:36,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.96 | bwd_microstep: 5121.91 | bwd_inner_microstep: 5052.73 | bwd_allreduce_microstep: 69.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-07-31 09:52:44,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.32 | bwd_microstep: 5059.09 | bwd_inner_microstep: 5032.81 | bwd_allreduce_microstep: 26.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3668 [2024-07-31 09:52:53,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.48 | bwd_microstep: 4941.96 | bwd_inner_microstep: 4914.56 | bwd_allreduce_microstep: 27.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 09:53:02,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.17 | bwd_microstep: 4934.12 | bwd_inner_microstep: 4907.55 | bwd_allreduce_microstep: 26.50 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3682 [2024-07-31 09:53:10,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 09:53:10,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.11 | bwd_microstep: 4890.90 | bwd_inner_microstep: 4871.49 | bwd_allreduce_microstep: 19.33 | step_microstep: 182.38 [2024-07-31 09:53:10,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29031.28 | bwd: 40569.16 | bwd_inner: 39838.18 | bwd_allreduce: 730.49 | step: 183.07 32%|███▏ | 393/1230 [7:41:16<16:16:44, 70.02s/it] {'loss': 1.1804, 'learning_rate': 1.5918256933604047e-05, 'epoch': 0.32} 32%|███▏ | 393/1230 [7:41:16<16:16:44, 70.02s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2451 [2024-07-31 09:53:19,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.39 | bwd_microstep: 5399.66 | bwd_inner_microstep: 4987.29 | bwd_allreduce_microstep: 412.30 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2029 [2024-07-31 09:53:28,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.28 | bwd_microstep: 5213.71 | bwd_inner_microstep: 4809.98 | bwd_allreduce_microstep: 403.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2231 [2024-07-31 09:53:37,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.24 | bwd_microstep: 5170.68 | bwd_inner_microstep: 4768.83 | bwd_allreduce_microstep: 401.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-31 09:53:46,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.77 | bwd_microstep: 5208.63 | bwd_inner_microstep: 4805.71 | bwd_allreduce_microstep: 402.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 09:53:54,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.79 | bwd_microstep: 4838.58 | bwd_inner_microstep: 4798.09 | bwd_allreduce_microstep: 40.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 09:54:03,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.66 | bwd_microstep: 5186.52 | bwd_inner_microstep: 5103.24 | bwd_allreduce_microstep: 83.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 09:54:11,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.00 | bwd_microstep: 4986.59 | bwd_inner_microstep: 4967.17 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 09:54:20,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.86 [2024-07-31 09:54:20,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.77 | bwd_microstep: 4926.47 | bwd_inner_microstep: 4902.20 | bwd_allreduce_microstep: 24.20 | step_microstep: 182.02 [2024-07-31 09:54:20,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28471.81 | bwd: 40930.83 | bwd_inner: 39142.43 | bwd_allreduce: 1787.90 | step: 182.60 32%|███▏ | 394/1230 [7:42:26<16:14:23, 69.93s/it] {'loss': 1.1685, 'learning_rate': 1.5897009858425383e-05, 'epoch': 0.32} 32%|███▏ | 394/1230 [7:42:26<16:14:23, 69.93s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3980 [2024-07-31 09:54:29,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.79 | bwd_microstep: 5335.10 | bwd_inner_microstep: 5295.98 | bwd_allreduce_microstep: 39.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-31 09:54:38,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.09 | bwd_microstep: 5154.16 | bwd_inner_microstep: 4754.79 | bwd_allreduce_microstep: 399.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 09:54:47,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.75 | bwd_microstep: 5237.68 | bwd_inner_microstep: 5149.19 | bwd_allreduce_microstep: 88.42 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3600 [2024-07-31 09:54:56,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.39 | bwd_microstep: 5164.59 | bwd_inner_microstep: 5065.25 | bwd_allreduce_microstep: 99.27 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3741 [2024-07-31 09:55:04,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3366.68 | bwd_microstep: 5047.04 | bwd_inner_microstep: 4983.21 | bwd_allreduce_microstep: 63.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-31 09:55:13,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.84 | bwd_microstep: 4978.82 | bwd_inner_microstep: 4959.46 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-31 09:55:21,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.43 | bwd_microstep: 5068.41 | bwd_inner_microstep: 5005.81 | bwd_allreduce_microstep: 62.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-31 09:55:30,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 09:55:30,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.20 | bwd_microstep: 4989.51 | bwd_inner_microstep: 4953.19 | bwd_allreduce_microstep: 36.25 | step_microstep: 181.19 [2024-07-31 09:55:30,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28895.09 | bwd: 40975.28 | bwd_inner: 40166.82 | bwd_allreduce: 807.98 | step: 181.78 32%|███▏ | 395/1230 [7:43:36<16:14:21, 70.01s/it] {'loss': 1.1846, 'learning_rate': 1.5875721890106574e-05, 'epoch': 0.32} 32%|███▏ | 395/1230 [7:43:36<16:14:21, 70.01s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3914 [2024-07-31 09:55:40,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3827.81 | bwd_microstep: 5339.85 | bwd_inner_microstep: 5294.91 | bwd_allreduce_microstep: 44.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3839 [2024-07-31 09:55:49,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.15 | bwd_microstep: 5554.69 | bwd_inner_microstep: 5458.29 | bwd_allreduce_microstep: 96.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-31 09:55:58,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.99 | bwd_microstep: 5196.51 | bwd_inner_microstep: 5114.78 | bwd_allreduce_microstep: 81.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-31 09:56:06,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.10 | bwd_microstep: 5165.80 | bwd_inner_microstep: 5086.54 | bwd_allreduce_microstep: 79.19 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-31 09:56:15,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.00 | bwd_microstep: 4891.65 | bwd_inner_microstep: 4867.82 | bwd_allreduce_microstep: 23.76 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 09:56:24,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.31 | bwd_microstep: 5154.91 | bwd_inner_microstep: 5086.90 | bwd_allreduce_microstep: 67.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 09:56:33,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.32 | bwd_microstep: 5061.93 | bwd_inner_microstep: 5004.41 | bwd_allreduce_microstep: 57.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 09:56:41,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 09:56:41,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.24 | bwd_microstep: 4914.19 | bwd_inner_microstep: 4890.52 | bwd_allreduce_microstep: 23.60 | step_microstep: 182.34 [2024-07-31 09:56:41,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29363.83 | bwd: 41279.51 | bwd_inner: 40804.11 | bwd_allreduce: 474.91 | step: 182.94 32%|███▏ | 396/1230 [7:44:47<16:17:13, 70.30s/it] {'loss': 1.2156, 'learning_rate': 1.5854393176270205e-05, 'epoch': 0.32} 32%|███▏ | 396/1230 [7:44:47<16:17:13, 70.30s/it]dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3653 [2024-07-31 09:56:51,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.21 | bwd_microstep: 5392.15 | bwd_inner_microstep: 5309.25 | bwd_allreduce_microstep: 82.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2255 [2024-07-31 09:56:59,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.22 | bwd_microstep: 5279.62 | bwd_inner_microstep: 4869.32 | bwd_allreduce_microstep: 410.22 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2243 [2024-07-31 09:57:08,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.29 | bwd_microstep: 5257.91 | bwd_inner_microstep: 4851.37 | bwd_allreduce_microstep: 406.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-31 09:57:17,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.28 | bwd_microstep: 5233.27 | bwd_inner_microstep: 4825.76 | bwd_allreduce_microstep: 407.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3672 [2024-07-31 09:57:26,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.06 | bwd_microstep: 4872.33 | bwd_inner_microstep: 4853.00 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.14 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 09:57:34,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.20 | bwd_microstep: 5094.76 | bwd_inner_microstep: 4698.29 | bwd_allreduce_microstep: 396.40 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3676 [2024-07-31 09:57:43,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.94 | bwd_microstep: 5082.68 | bwd_inner_microstep: 5012.49 | bwd_allreduce_microstep: 70.12 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 09:57:52,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 09:57:52,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.76 | bwd_microstep: 5018.66 | bwd_inner_microstep: 4964.98 | bwd_allreduce_microstep: 53.61 | step_microstep: 202.50 [2024-07-31 09:57:52,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28715.86 | bwd: 41231.35 | bwd_inner: 39384.39 | bwd_allreduce: 1846.45 | step: 203.23 32%|███▏ | 397/1230 [7:45:58<16:16:01, 70.30s/it] {'loss': 1.216, 'learning_rate': 1.5833023864821427e-05, 'epoch': 0.32} 32%|███▏ | 397/1230 [7:45:58<16:16:01, 70.30s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4028 [2024-07-31 09:58:01,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3893.15 | bwd_microstep: 5446.57 | bwd_inner_microstep: 5409.21 | bwd_allreduce_microstep: 37.29 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3712 [2024-07-31 09:58:10,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.46 | bwd_microstep: 5299.25 | bwd_inner_microstep: 5222.51 | bwd_allreduce_microstep: 76.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3878 [2024-07-31 09:58:19,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.96 | bwd_microstep: 5043.90 | bwd_inner_microstep: 5016.10 | bwd_allreduce_microstep: 27.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3865 [2024-07-31 09:58:28,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3803.17 | bwd_microstep: 5123.60 | bwd_inner_microstep: 5104.31 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 09:58:36,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.69 | bwd_microstep: 4822.87 | bwd_inner_microstep: 4786.73 | bwd_allreduce_microstep: 36.07 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 09:58:44,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.03 | bwd_microstep: 4896.54 | bwd_inner_microstep: 4877.24 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 09:58:53,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.01 | bwd_microstep: 5061.14 | bwd_inner_microstep: 5002.90 | bwd_allreduce_microstep: 58.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 09:59:02,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 09:59:02,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.02 | bwd_microstep: 5136.16 | bwd_inner_microstep: 5062.47 | bwd_allreduce_microstep: 73.63 | step_microstep: 181.34 [2024-07-31 09:59:02,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29099.41 | bwd: 40830.01 | bwd_inner: 40481.41 | bwd_allreduce: 348.13 | step: 181.93 32%|███▏ | 398/1230 [7:47:08<16:14:43, 70.29s/it] {'loss': 1.2288, 'learning_rate': 1.5811614103946905e-05, 'epoch': 0.32} 32%|███▏ | 398/1230 [7:47:08<16:14:43, 70.29s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3985 [2024-07-31 09:59:11,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.86 | bwd_microstep: 5586.32 | bwd_inner_microstep: 5493.99 | bwd_allreduce_microstep: 92.25 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1261 [2024-07-31 09:59:20,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3238.58 | bwd_microstep: 5042.60 | bwd_inner_microstep: 4657.03 | bwd_allreduce_microstep: 385.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-31 09:59:28,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.56 | bwd_microstep: 5000.86 | bwd_inner_microstep: 4981.50 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 09:59:37,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.24 | bwd_microstep: 5167.69 | bwd_inner_microstep: 5111.25 | bwd_allreduce_microstep: 56.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 09:59:46,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3384.49 | bwd_microstep: 4919.42 | bwd_inner_microstep: 4873.43 | bwd_allreduce_microstep: 45.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2126 [2024-07-31 09:59:54,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.63 | bwd_microstep: 5113.73 | bwd_inner_microstep: 4715.28 | bwd_allreduce_microstep: 398.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 10:00:03,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.72 | bwd_microstep: 5037.78 | bwd_inner_microstep: 4982.22 | bwd_allreduce_microstep: 55.49 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1659 [2024-07-31 10:00:12,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 10:00:12,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.26 | bwd_microstep: 5145.01 | bwd_inner_microstep: 4748.25 | bwd_allreduce_microstep: 396.69 | step_microstep: 181.97 [2024-07-31 10:00:12,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28350.23 | bwd: 41013.40 | bwd_inner: 39562.90 | bwd_allreduce: 1450.01 | step: 182.54 32%|███▏ | 399/1230 [7:48:18<16:11:03, 70.11s/it] {'loss': 1.17, 'learning_rate': 1.5790164042113805e-05, 'epoch': 0.32} 32%|███▏ | 399/1230 [7:48:18<16:11:03, 70.11s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3555 [2024-07-31 10:00:21,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.38 | bwd_microstep: 5324.21 | bwd_inner_microstep: 5221.19 | bwd_allreduce_microstep: 102.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 10:00:29,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3358.24 | bwd_microstep: 5021.47 | bwd_inner_microstep: 4983.24 | bwd_allreduce_microstep: 38.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-31 10:00:38,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.16 | bwd_microstep: 5146.25 | bwd_inner_microstep: 5074.14 | bwd_allreduce_microstep: 72.03 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-31 10:00:47,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.38 | bwd_microstep: 5055.44 | bwd_inner_microstep: 5027.68 | bwd_allreduce_microstep: 27.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 10:00:55,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.41 | bwd_microstep: 4997.62 | bwd_inner_microstep: 4978.33 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 10:01:04,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.10 | bwd_microstep: 4874.77 | bwd_inner_microstep: 4855.38 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-31 10:01:12,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.73 | bwd_microstep: 4985.61 | bwd_inner_microstep: 4930.95 | bwd_allreduce_microstep: 54.59 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2153 [2024-07-31 10:01:21,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 10:01:21,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.77 | bwd_microstep: 5128.97 | bwd_inner_microstep: 4730.10 | bwd_allreduce_microstep: 398.80 | step_microstep: 181.48 [2024-07-31 10:01:21,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28831.07 | bwd: 40534.32 | bwd_inner: 39800.94 | bwd_allreduce: 732.86 | step: 182.07 33%|███▎ | 400/1230 [7:49:27<16:08:09, 69.99s/it] {'loss': 1.2223, 'learning_rate': 1.576867382806877e-05, 'epoch': 0.33} 33%|███▎ | 400/1230 [7:49:27<16:08:09, 69.99s/it][INFO|trainer.py:2936] 2024-07-31 10:01:48,240 >> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400 [INFO|configuration_utils.py:473] 2024-07-31 10:01:48,241 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/config.json [INFO|configuration_utils.py:594] 2024-07-31 10:01:48,242 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/generation_config.json [INFO|modeling_utils.py:2501] 2024-07-31 10:02:41,749 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-07-31 10:02:41,750 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-07-31 10:02:41,751 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-07-31 10:02:41,751 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/added_tokens.json [2024-07-31 10:02:43,417] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step400 is about to be saved! [2024-07-31 10:02:43,727] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/global_step400/zero_pp_rank_0_mp_rank_00_model_states.pt [2024-07-31 10:02:43,728] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/global_step400/zero_pp_rank_0_mp_rank_00_model_states.pt... [2024-07-31 10:02:45,428] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/global_step400/zero_pp_rank_0_mp_rank_00_model_states.pt. [2024-07-31 10:02:45,982] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-07-31 10:03:46,800] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-07-31 10:03:46,800] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-07-31 10:03:46,829] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step400 is ready now! [INFO|trainer.py:3028] 2024-07-31 10:03:46,861 >> Deleting older checkpoint [/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/checkpoint-200] due to args.save_total_limit dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3826 [2024-07-31 10:04:28,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.04 | bwd_microstep: 5522.47 | bwd_inner_microstep: 5427.55 | bwd_allreduce_microstep: 94.84 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2320 [2024-07-31 10:04:37,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.63 | bwd_microstep: 5131.28 | bwd_inner_microstep: 4731.15 | bwd_allreduce_microstep: 400.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4008 [2024-07-31 10:04:46,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3809.27 | bwd_microstep: 5234.19 | bwd_inner_microstep: 5214.85 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-31 10:04:55,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.27 | bwd_microstep: 5098.29 | bwd_inner_microstep: 5048.08 | bwd_allreduce_microstep: 50.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 10:05:03,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.33 | bwd_microstep: 5383.30 | bwd_inner_microstep: 4860.23 | bwd_allreduce_microstep: 523.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 10:05:11,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3207.76 | bwd_microstep: 4763.05 | bwd_inner_microstep: 4743.64 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2106 [2024-07-31 10:05:20,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.41 | bwd_microstep: 5137.07 | bwd_inner_microstep: 4737.46 | bwd_allreduce_microstep: 399.54 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 10:05:29,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 10:05:29,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.45 | bwd_microstep: 4982.40 | bwd_inner_microstep: 4937.21 | bwd_allreduce_microstep: 45.13 | step_microstep: 182.88 [2024-07-31 10:05:29,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28270.05 | bwd: 41252.04 | bwd_inner: 39700.11 | bwd_allreduce: 1551.42 | step: 183.59 33%|███▎ | 401/1230 [7:53:35<28:22:35, 123.23s/it] {'loss': 1.1373, 'learning_rate': 1.5747143610836873e-05, 'epoch': 0.33} 33%|███▎ | 401/1230 [7:53:35<28:22:35, 123.23s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3949 [2024-07-31 10:05:38,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3817.64 | bwd_microstep: 5200.17 | bwd_inner_microstep: 5179.57 | bwd_allreduce_microstep: 20.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2239 [2024-07-31 10:05:47,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.47 | bwd_microstep: 5343.30 | bwd_inner_microstep: 4928.81 | bwd_allreduce_microstep: 414.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3787 [2024-07-31 10:05:56,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.68 | bwd_microstep: 5196.30 | bwd_inner_microstep: 5142.53 | bwd_allreduce_microstep: 53.71 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2079 [2024-07-31 10:06:04,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3038.37 | bwd_microstep: 4990.62 | bwd_inner_microstep: 4604.72 | bwd_allreduce_microstep: 385.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 10:06:12,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.34 | bwd_microstep: 5110.09 | bwd_inner_microstep: 5041.86 | bwd_allreduce_microstep: 68.16 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-31 10:06:21,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.54 | bwd_microstep: 5122.40 | bwd_inner_microstep: 4725.71 | bwd_allreduce_microstep: 396.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 10:06:30,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.60 | bwd_microstep: 4947.10 | bwd_inner_microstep: 4914.83 | bwd_allreduce_microstep: 32.21 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 10:06:38,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 10:06:38,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.98 | bwd_microstep: 4992.20 | bwd_inner_microstep: 4941.62 | bwd_allreduce_microstep: 50.51 | step_microstep: 182.71 [2024-07-31 10:06:38,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28332.53 | bwd: 40902.16 | bwd_inner: 39479.58 | bwd_allreduce: 1422.08 | step: 183.32 33%|███▎ | 402/1230 [7:54:44<24:38:23, 107.13s/it] {'loss': 1.2559, 'learning_rate': 1.572557353972059e-05, 'epoch': 0.33} 33%|███▎ | 402/1230 [7:54:44<24:38:23, 107.13s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2418 [2024-07-31 10:06:47,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.08 | bwd_microstep: 5420.68 | bwd_inner_microstep: 5003.20 | bwd_allreduce_microstep: 417.41 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3842 [2024-07-31 10:06:56,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.21 | bwd_microstep: 5229.60 | bwd_inner_microstep: 5174.45 | bwd_allreduce_microstep: 55.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 10:07:05,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.55 | bwd_microstep: 5141.68 | bwd_inner_microstep: 5064.77 | bwd_allreduce_microstep: 76.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 10:07:14,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.48 | bwd_microstep: 5115.76 | bwd_inner_microstep: 5071.24 | bwd_allreduce_microstep: 44.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-31 10:07:23,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.24 | bwd_microstep: 5172.14 | bwd_inner_microstep: 5115.89 | bwd_allreduce_microstep: 56.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 10:07:31,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.99 | bwd_microstep: 5270.39 | bwd_inner_microstep: 4860.94 | bwd_allreduce_microstep: 409.38 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-31 10:07:40,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.00 | bwd_microstep: 5165.67 | bwd_inner_microstep: 4764.08 | bwd_allreduce_microstep: 401.51 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 10:07:49,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 10:07:49,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.07 | bwd_microstep: 5058.51 | bwd_inner_microstep: 4665.24 | bwd_allreduce_microstep: 393.20 | step_microstep: 182.59 [2024-07-31 10:07:49,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28600.54 | bwd: 41574.40 | bwd_inner: 39719.76 | bwd_allreduce: 1854.16 | step: 183.19 33%|███▎ | 403/1230 [7:55:55<22:05:09, 96.14s/it] {'loss': 1.2033, 'learning_rate': 1.570396376429877e-05, 'epoch': 0.33} 33%|███▎ | 403/1230 [7:55:55<22:05:09, 96.14s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-31 10:07:57,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3245.85 | bwd_microstep: 5122.94 | bwd_inner_microstep: 5046.43 | bwd_allreduce_microstep: 76.44 | step_microstep: 0.20 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-31 10:08:06,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.03 | bwd_microstep: 5167.82 | bwd_inner_microstep: 5090.74 | bwd_allreduce_microstep: 77.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 10:08:15,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.10 | bwd_microstep: 5205.17 | bwd_inner_microstep: 5121.06 | bwd_allreduce_microstep: 84.04 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3832 [2024-07-31 10:08:24,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.15 | bwd_microstep: 5056.24 | bwd_inner_microstep: 5035.09 | bwd_allreduce_microstep: 21.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 10:08:32,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.60 | bwd_microstep: 5151.11 | bwd_inner_microstep: 5097.08 | bwd_allreduce_microstep: 53.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 10:08:41,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.41 | bwd_microstep: 5115.31 | bwd_inner_microstep: 4718.00 | bwd_allreduce_microstep: 397.23 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 10:08:50,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.73 | bwd_microstep: 5209.44 | bwd_inner_microstep: 5130.28 | bwd_allreduce_microstep: 79.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 10:08:59,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 10:08:59,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.48 | bwd_microstep: 5009.78 | bwd_inner_microstep: 4953.97 | bwd_allreduce_microstep: 55.73 | step_microstep: 181.48 [2024-07-31 10:08:59,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28527.25 | bwd: 41037.79 | bwd_inner: 40192.59 | bwd_allreduce: 844.70 | step: 182.20 33%|███▎ | 404/1230 [7:57:05<20:15:10, 88.27s/it] {'loss': 1.2259, 'learning_rate': 1.5682314434425593e-05, 'epoch': 0.33} 33%|███▎ | 404/1230 [7:57:05<20:15:10, 88.27s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 10:09:08,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.07 | bwd_microstep: 5437.39 | bwd_inner_microstep: 5397.48 | bwd_allreduce_microstep: 39.83 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3832 [2024-07-31 10:09:17,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.74 | bwd_microstep: 5131.40 | bwd_inner_microstep: 5087.43 | bwd_allreduce_microstep: 43.91 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 10:09:26,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3798.33 | bwd_microstep: 5242.71 | bwd_inner_microstep: 5194.76 | bwd_allreduce_microstep: 47.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-31 10:09:34,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3352.59 | bwd_microstep: 5043.94 | bwd_inner_microstep: 4983.95 | bwd_allreduce_microstep: 59.93 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-07-31 10:09:43,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.77 | bwd_microstep: 5025.52 | bwd_inner_microstep: 5000.27 | bwd_allreduce_microstep: 25.18 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2105 [2024-07-31 10:09:52,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3487.31 | bwd_microstep: 5059.48 | bwd_inner_microstep: 4667.19 | bwd_allreduce_microstep: 392.22 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3257 [2024-07-31 10:10:00,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3070.38 | bwd_microstep: 4871.50 | bwd_inner_microstep: 4768.47 | bwd_allreduce_microstep: 102.97 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2710 [2024-07-31 10:10:08,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 10:10:08,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.13 | bwd_microstep: 5069.34 | bwd_inner_microstep: 4671.93 | bwd_allreduce_microstep: 397.34 | step_microstep: 181.71 [2024-07-31 10:10:08,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28317.22 | bwd: 40881.26 | bwd_inner: 39771.41 | bwd_allreduce: 1109.35 | step: 182.42 33%|███▎ | 405/1230 [7:58:14<18:56:25, 82.65s/it] {'loss': 1.1371, 'learning_rate': 1.5660625700229526e-05, 'epoch': 0.33} 33%|███▎ | 405/1230 [7:58:14<18:56:25, 82.65s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2276 [2024-07-31 10:10:18,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.15 | bwd_microstep: 5532.27 | bwd_inner_microstep: 5104.84 | bwd_allreduce_microstep: 427.36 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2297 [2024-07-31 10:10:27,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.42 | bwd_microstep: 5500.01 | bwd_inner_microstep: 5077.64 | bwd_allreduce_microstep: 422.30 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3856 [2024-07-31 10:10:36,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.03 | bwd_microstep: 5103.49 | bwd_inner_microstep: 5084.05 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-31 10:10:44,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.47 | bwd_microstep: 5232.66 | bwd_inner_microstep: 4824.89 | bwd_allreduce_microstep: 407.70 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 10:10:53,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.17 | bwd_microstep: 4957.89 | bwd_inner_microstep: 4922.16 | bwd_allreduce_microstep: 35.66 | step_microstep: 0.10 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1134 [2024-07-31 10:11:02,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.38 | bwd_microstep: 5213.91 | bwd_inner_microstep: 4812.98 | bwd_allreduce_microstep: 400.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 10:11:10,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.17 | bwd_microstep: 5019.80 | bwd_inner_microstep: 4961.22 | bwd_allreduce_microstep: 58.51 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-31 10:11:19,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 10:11:19,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.18 | bwd_microstep: 4867.62 | bwd_inner_microstep: 4848.24 | bwd_allreduce_microstep: 19.30 | step_microstep: 181.48 [2024-07-31 10:11:19,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29063.87 | bwd: 41427.63 | bwd_inner: 39635.97 | bwd_allreduce: 1791.17 | step: 182.19 33%|███▎ | 406/1230 [7:59:25<18:06:18, 79.10s/it] {'loss': 1.1417, 'learning_rate': 1.5638897712112303e-05, 'epoch': 0.33} 33%|███▎ | 406/1230 [7:59:25<18:06:18, 79.10s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3946 [2024-07-31 10:11:28,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3829.10 | bwd_microstep: 5307.28 | bwd_inner_microstep: 5273.67 | bwd_allreduce_microstep: 33.54 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3881 [2024-07-31 10:11:37,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.25 | bwd_microstep: 5170.37 | bwd_inner_microstep: 5128.25 | bwd_allreduce_microstep: 42.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-07-31 10:11:46,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.61 | bwd_microstep: 5215.32 | bwd_inner_microstep: 5124.56 | bwd_allreduce_microstep: 90.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 10:11:55,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.37 | bwd_microstep: 5323.72 | bwd_inner_microstep: 5227.03 | bwd_allreduce_microstep: 96.62 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2168 [2024-07-31 10:12:04,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.33 | bwd_microstep: 5148.62 | bwd_inner_microstep: 4746.35 | bwd_allreduce_microstep: 402.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-31 10:12:13,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.20 | bwd_microstep: 5303.33 | bwd_inner_microstep: 5229.87 | bwd_allreduce_microstep: 73.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 10:12:21,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.66 | bwd_microstep: 5043.51 | bwd_inner_microstep: 5017.80 | bwd_allreduce_microstep: 25.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 10:12:30,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.80 [2024-07-31 10:12:30,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.67 | bwd_microstep: 5027.91 | bwd_inner_microstep: 4971.42 | bwd_allreduce_microstep: 56.43 | step_microstep: 181.74 [2024-07-31 10:12:30,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29261.09 | bwd: 41540.04 | bwd_inner: 40718.88 | bwd_allreduce: 820.68 | step: 182.35 33%|███▎ | 407/1230 [8:00:36<17:32:12, 76.71s/it] {'loss': 1.2067, 'learning_rate': 1.561713062074785e-05, 'epoch': 0.33} 33%|███▎ | 407/1230 [8:00:36<17:32:12, 76.71s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2418 [2024-07-31 10:12:39,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.22 | bwd_microstep: 5270.52 | bwd_inner_microstep: 4864.53 | bwd_allreduce_microstep: 405.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4001 [2024-07-31 10:12:48,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.41 | bwd_microstep: 5107.58 | bwd_inner_microstep: 5087.87 | bwd_allreduce_microstep: 19.64 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3930 [2024-07-31 10:12:57,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.86 | bwd_microstep: 5287.10 | bwd_inner_microstep: 5231.68 | bwd_allreduce_microstep: 55.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-07-31 10:13:06,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3793.47 | bwd_microstep: 5057.38 | bwd_inner_microstep: 5028.71 | bwd_allreduce_microstep: 28.60 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3799 [2024-07-31 10:13:15,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.65 | bwd_microstep: 5163.39 | bwd_inner_microstep: 5115.12 | bwd_allreduce_microstep: 48.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 10:13:23,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.87 | bwd_microstep: 4987.38 | bwd_inner_microstep: 4967.54 | bwd_allreduce_microstep: 19.77 | step_microstep: 0.18 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2126 [2024-07-31 10:13:32,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.82 | bwd_microstep: 5175.16 | bwd_inner_microstep: 4773.26 | bwd_allreduce_microstep: 401.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-31 10:13:41,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 10:13:41,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.63 | bwd_microstep: 5016.21 | bwd_inner_microstep: 4962.62 | bwd_allreduce_microstep: 53.52 | step_microstep: 182.48 [2024-07-31 10:13:41,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29141.83 | bwd: 41064.71 | bwd_inner: 40031.25 | bwd_allreduce: 1032.96 | step: 183.20 33%|███▎ | 408/1230 [8:01:47<17:05:34, 74.86s/it] {'loss': 1.2265, 'learning_rate': 1.5595324577081262e-05, 'epoch': 0.33} 33%|███▎ | 408/1230 [8:01:47<17:05:34, 74.86s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 10:13:50,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3879.87 | bwd_microstep: 5331.52 | bwd_inner_microstep: 5312.50 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3862 [2024-07-31 10:13:59,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.76 | bwd_microstep: 5102.86 | bwd_inner_microstep: 5083.48 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2278 [2024-07-31 10:14:08,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.61 | bwd_microstep: 5196.38 | bwd_inner_microstep: 4789.48 | bwd_allreduce_microstep: 406.84 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 690 [2024-07-31 10:14:16,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3474.17 | bwd_microstep: 5266.27 | bwd_inner_microstep: 4860.69 | bwd_allreduce_microstep: 405.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 10:14:25,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.02 | bwd_microstep: 4980.04 | bwd_inner_microstep: 4960.69 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-31 10:14:34,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.54 | bwd_microstep: 4999.26 | bwd_inner_microstep: 4979.89 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3664 [2024-07-31 10:14:42,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.63 | bwd_microstep: 4868.04 | bwd_inner_microstep: 4848.65 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 10:14:51,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 10:14:51,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.16 | bwd_microstep: 5120.45 | bwd_inner_microstep: 4723.31 | bwd_allreduce_microstep: 397.08 | step_microstep: 182.16 [2024-07-31 10:14:51,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29270.65 | bwd: 40864.80 | bwd_inner: 39558.62 | bwd_allreduce: 1305.69 | step: 182.73 33%|███▎ | 409/1230 [8:02:57<16:46:18, 73.54s/it] {'loss': 1.1781, 'learning_rate': 1.5573479732327758e-05, 'epoch': 0.33} 33%|███▎ | 409/1230 [8:02:57<16:46:18, 73.54s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 10:15:00,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.24 | bwd_microstep: 5282.55 | bwd_inner_microstep: 5197.39 | bwd_allreduce_microstep: 85.10 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 10:15:09,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.69 | bwd_microstep: 5237.40 | bwd_inner_microstep: 4832.17 | bwd_allreduce_microstep: 405.15 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-31 10:15:18,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.86 | bwd_microstep: 5154.22 | bwd_inner_microstep: 5073.80 | bwd_allreduce_microstep: 80.36 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 10:15:27,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.93 | bwd_microstep: 5129.14 | bwd_inner_microstep: 5057.34 | bwd_allreduce_microstep: 71.73 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 10:15:35,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.10 | bwd_microstep: 5183.62 | bwd_inner_microstep: 5126.13 | bwd_allreduce_microstep: 57.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3176 [2024-07-31 10:15:44,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.90 | bwd_microstep: 5200.71 | bwd_inner_microstep: 4872.71 | bwd_allreduce_microstep: 327.93 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-31 10:15:53,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.15 | bwd_microstep: 5119.60 | bwd_inner_microstep: 4721.46 | bwd_allreduce_microstep: 398.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 10:16:02,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 10:16:02,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.01 | bwd_microstep: 5112.22 | bwd_inner_microstep: 4715.26 | bwd_allreduce_microstep: 396.89 | step_microstep: 181.86 [2024-07-31 10:16:02,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28631.81 | bwd: 41419.46 | bwd_inner: 39596.20 | bwd_allreduce: 1822.75 | step: 182.58 33%|███▎ | 410/1230 [8:04:08<16:32:07, 72.59s/it] {'loss': 1.1462, 'learning_rate': 1.555159623797161e-05, 'epoch': 0.33} 33%|███▎ | 410/1230 [8:04:08<16:32:07, 72.59s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3550 [2024-07-31 10:16:11,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.33 | bwd_microstep: 5602.68 | bwd_inner_microstep: 5397.13 | bwd_allreduce_microstep: 205.48 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-31 10:16:19,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3192.01 | bwd_microstep: 4732.40 | bwd_inner_microstep: 4700.59 | bwd_allreduce_microstep: 31.75 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3741 [2024-07-31 10:16:28,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.91 | bwd_microstep: 5117.44 | bwd_inner_microstep: 5063.72 | bwd_allreduce_microstep: 53.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3833 [2024-07-31 10:16:37,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.52 | bwd_microstep: 5272.24 | bwd_inner_microstep: 5185.58 | bwd_allreduce_microstep: 86.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2248 [2024-07-31 10:16:45,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3061.11 | bwd_microstep: 5025.80 | bwd_inner_microstep: 4638.09 | bwd_allreduce_microstep: 387.64 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2167 [2024-07-31 10:16:53,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.76 | bwd_microstep: 5099.98 | bwd_inner_microstep: 4704.09 | bwd_allreduce_microstep: 395.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 10:17:02,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.44 | bwd_microstep: 5097.81 | bwd_inner_microstep: 5029.49 | bwd_allreduce_microstep: 68.25 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-31 10:17:11,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 10:17:11,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.19 | bwd_microstep: 4914.69 | bwd_inner_microstep: 4889.93 | bwd_allreduce_microstep: 24.69 | step_microstep: 181.99 [2024-07-31 10:17:11,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28047.18 | bwd: 40863.02 | bwd_inner: 39608.55 | bwd_allreduce: 1253.98 | step: 182.59 33%|███▎ | 411/1230 [8:05:17<16:17:10, 71.59s/it] {'loss': 1.1876, 'learning_rate': 1.552967424576512e-05, 'epoch': 0.33} 33%|███▎ | 411/1230 [8:05:17<16:17:10, 71.59s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3953 [2024-07-31 10:17:20,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.67 | bwd_microstep: 5373.16 | bwd_inner_microstep: 5314.77 | bwd_allreduce_microstep: 58.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3871 [2024-07-31 10:17:28,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3283.22 | bwd_microstep: 4910.40 | bwd_inner_microstep: 4891.11 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 10:17:36,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.32 | bwd_microstep: 4879.94 | bwd_inner_microstep: 4827.67 | bwd_allreduce_microstep: 52.20 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 10:17:45,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.99 | bwd_microstep: 5025.92 | bwd_inner_microstep: 4986.02 | bwd_allreduce_microstep: 39.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 10:17:53,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.06 | bwd_microstep: 4788.61 | bwd_inner_microstep: 4769.16 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2918 [2024-07-31 10:18:02,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.19 | bwd_microstep: 5031.91 | bwd_inner_microstep: 4640.51 | bwd_allreduce_microstep: 391.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 10:18:10,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.52 | bwd_microstep: 4890.36 | bwd_inner_microstep: 4870.93 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 10:18:19,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 10:18:19,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.60 | bwd_microstep: 5089.29 | bwd_inner_microstep: 5023.44 | bwd_allreduce_microstep: 65.78 | step_microstep: 182.47 [2024-07-31 10:18:19,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27807.47 | bwd: 39989.57 | bwd_inner: 39323.55 | bwd_allreduce: 665.52 | step: 183.07 33%|███▎ | 412/1230 [8:06:25<16:01:50, 70.55s/it] {'loss': 1.1872, 'learning_rate': 1.5507713907727557e-05, 'epoch': 0.33} 33%|███▎ | 412/1230 [8:06:25<16:01:50, 70.55s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3647 [2024-07-31 10:18:28,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3671.26 | bwd_microstep: 5492.00 | bwd_inner_microstep: 5309.18 | bwd_allreduce_microstep: 182.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-31 10:18:37,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.93 | bwd_microstep: 5086.41 | bwd_inner_microstep: 5056.11 | bwd_allreduce_microstep: 30.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-07-31 10:18:46,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.76 | bwd_microstep: 5033.57 | bwd_inner_microstep: 5014.26 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2905 [2024-07-31 10:18:55,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.18 | bwd_microstep: 5160.97 | bwd_inner_microstep: 4759.25 | bwd_allreduce_microstep: 401.65 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3574 [2024-07-31 10:19:03,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.81 | bwd_microstep: 5076.17 | bwd_inner_microstep: 4978.86 | bwd_allreduce_microstep: 97.24 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3719 [2024-07-31 10:19:12,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.93 | bwd_microstep: 5179.01 | bwd_inner_microstep: 5103.05 | bwd_allreduce_microstep: 75.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 10:19:21,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.57 | bwd_microstep: 5191.98 | bwd_inner_microstep: 5069.07 | bwd_allreduce_microstep: 122.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 10:19:30,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 10:19:30,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.46 | bwd_microstep: 5024.77 | bwd_inner_microstep: 4969.42 | bwd_allreduce_microstep: 55.29 | step_microstep: 181.31 [2024-07-31 10:19:30,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29183.81 | bwd: 41244.87 | bwd_inner: 40259.13 | bwd_allreduce: 985.24 | step: 181.90 34%|███▎ | 413/1230 [8:07:36<16:01:31, 70.61s/it] {'loss': 1.181, 'learning_rate': 1.5485715376144087e-05, 'epoch': 0.34} 34%|███▎ | 413/1230 [8:07:36<16:01:31, 70.61s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3949 [2024-07-31 10:19:39,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.37 | bwd_microstep: 5311.99 | bwd_inner_microstep: 5254.82 | bwd_allreduce_microstep: 57.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2325 [2024-07-31 10:19:48,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.23 | bwd_microstep: 5300.99 | bwd_inner_microstep: 4892.78 | bwd_allreduce_microstep: 408.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-31 10:19:56,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.55 | bwd_microstep: 5168.59 | bwd_inner_microstep: 5111.63 | bwd_allreduce_microstep: 56.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3658 [2024-07-31 10:20:05,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.15 | bwd_microstep: 5029.04 | bwd_inner_microstep: 4987.69 | bwd_allreduce_microstep: 41.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-31 10:20:14,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.24 | bwd_microstep: 5237.84 | bwd_inner_microstep: 4830.66 | bwd_allreduce_microstep: 407.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3808 [2024-07-31 10:20:23,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.94 | bwd_microstep: 5040.22 | bwd_inner_microstep: 5020.90 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 10:20:32,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.84 | bwd_microstep: 5024.45 | bwd_inner_microstep: 4986.37 | bwd_allreduce_microstep: 38.02 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2130 [2024-07-31 10:20:40,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 10:20:40,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.69 | bwd_microstep: 5105.86 | bwd_inner_microstep: 4708.83 | bwd_allreduce_microstep: 396.96 | step_microstep: 182.40 [2024-07-31 10:20:40,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29149.91 | bwd: 41218.96 | bwd_inner: 39793.62 | bwd_allreduce: 1424.86 | step: 182.99 34%|███▎ | 414/1230 [8:08:46<16:00:43, 70.64s/it] {'loss': 1.175, 'learning_rate': 1.5463678803564753e-05, 'epoch': 0.34} 34%|███▎ | 414/1230 [8:08:46<16:00:43, 70.64s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2249 [2024-07-31 10:20:50,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.43 | bwd_microstep: 5531.09 | bwd_inner_microstep: 5104.29 | bwd_allreduce_microstep: 426.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 10:20:58,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.37 | bwd_microstep: 4883.76 | bwd_inner_microstep: 4837.07 | bwd_allreduce_microstep: 46.62 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 10:21:07,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.91 | bwd_microstep: 5189.09 | bwd_inner_microstep: 5106.92 | bwd_allreduce_microstep: 82.10 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-31 10:21:15,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.29 | bwd_microstep: 5151.96 | bwd_inner_microstep: 5099.41 | bwd_allreduce_microstep: 52.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-31 10:21:24,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.49 | bwd_microstep: 5259.13 | bwd_inner_microstep: 4852.32 | bwd_allreduce_microstep: 406.75 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 10:21:33,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.83 | bwd_microstep: 5161.72 | bwd_inner_microstep: 4759.44 | bwd_allreduce_microstep: 402.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 10:21:42,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.37 | bwd_microstep: 5062.32 | bwd_inner_microstep: 4999.49 | bwd_allreduce_microstep: 62.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 10:21:50,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 10:21:50,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.31 | bwd_microstep: 5017.75 | bwd_inner_microstep: 4963.35 | bwd_allreduce_microstep: 54.33 | step_microstep: 182.35 [2024-07-31 10:21:50,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28292.92 | bwd: 41256.81 | bwd_inner: 39722.23 | bwd_allreduce: 1534.09 | step: 183.05 34%|███▎ | 415/1230 [8:09:56<15:56:26, 70.41s/it] {'loss': 1.1791, 'learning_rate': 1.544160434280337e-05, 'epoch': 0.34} 34%|███▎ | 415/1230 [8:09:56<15:56:26, 70.41s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3985 [2024-07-31 10:22:00,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.10 | bwd_microstep: 5486.81 | bwd_inner_microstep: 5429.51 | bwd_allreduce_microstep: 57.23 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3809 [2024-07-31 10:22:08,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.16 | bwd_microstep: 5128.73 | bwd_inner_microstep: 5091.73 | bwd_allreduce_microstep: 36.93 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2071 [2024-07-31 10:22:17,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.51 | bwd_microstep: 5269.10 | bwd_inner_microstep: 4859.33 | bwd_allreduce_microstep: 409.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 10:22:26,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.90 | bwd_microstep: 5075.35 | bwd_inner_microstep: 5045.11 | bwd_allreduce_microstep: 30.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3769 [2024-07-31 10:22:35,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.83 | bwd_microstep: 5239.52 | bwd_inner_microstep: 5177.69 | bwd_allreduce_microstep: 61.76 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2882 [2024-07-31 10:22:44,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.60 | bwd_microstep: 5146.20 | bwd_inner_microstep: 4746.97 | bwd_allreduce_microstep: 399.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 10:22:52,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.01 | bwd_microstep: 4892.29 | bwd_inner_microstep: 4872.94 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3708 [2024-07-31 10:23:01,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 10:23:01,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.90 | bwd_microstep: 5063.57 | bwd_inner_microstep: 4989.69 | bwd_allreduce_microstep: 73.81 | step_microstep: 182.21 [2024-07-31 10:23:01,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29190.91 | bwd: 41301.54 | bwd_inner: 40212.92 | bwd_allreduce: 1088.14 | step: 182.80 34%|███▍ | 416/1230 [8:11:07<15:56:58, 70.54s/it] {'loss': 1.19, 'learning_rate': 1.5419492146936518e-05, 'epoch': 0.34} 34%|███▍ | 416/1230 [8:11:07<15:56:58, 70.54s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 10:23:10,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3867.27 | bwd_microstep: 5388.74 | bwd_inner_microstep: 5369.72 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3769 [2024-07-31 10:23:19,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.53 | bwd_microstep: 5141.17 | bwd_inner_microstep: 5096.07 | bwd_allreduce_microstep: 45.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3782 [2024-07-31 10:23:28,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.29 | bwd_microstep: 5026.07 | bwd_inner_microstep: 5006.66 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3788 [2024-07-31 10:23:37,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.19 | bwd_microstep: 5102.37 | bwd_inner_microstep: 5042.08 | bwd_allreduce_microstep: 60.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-31 10:23:45,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.89 | bwd_microstep: 4979.48 | bwd_inner_microstep: 4947.53 | bwd_allreduce_microstep: 31.88 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 10:23:53,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3168.74 | bwd_microstep: 4678.31 | bwd_inner_microstep: 4658.94 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 10:24:02,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.41 | bwd_microstep: 5104.12 | bwd_inner_microstep: 5035.69 | bwd_allreduce_microstep: 68.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2127 [2024-07-31 10:24:10,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 10:24:10,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3015.22 | bwd_microstep: 4910.96 | bwd_inner_microstep: 4536.59 | bwd_allreduce_microstep: 374.30 | step_microstep: 182.35 [2024-07-31 10:24:10,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28293.44 | bwd: 40331.22 | bwd_inner: 39693.23 | bwd_allreduce: 637.50 | step: 182.94 34%|███▍ | 417/1230 [8:12:16<15:49:21, 70.06s/it] {'loss': 1.245, 'learning_rate': 1.5397342369302425e-05, 'epoch': 0.34} 34%|███▍ | 417/1230 [8:12:16<15:49:21, 70.06s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4021 [2024-07-31 10:24:19,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.87 | bwd_microstep: 5190.39 | bwd_inner_microstep: 5168.68 | bwd_allreduce_microstep: 21.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-31 10:24:28,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.05 | bwd_microstep: 5121.67 | bwd_inner_microstep: 5056.83 | bwd_allreduce_microstep: 64.76 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-31 10:24:36,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.50 | bwd_microstep: 5132.64 | bwd_inner_microstep: 5057.49 | bwd_allreduce_microstep: 75.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3575 [2024-07-31 10:24:45,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.31 | bwd_microstep: 5076.22 | bwd_inner_microstep: 5003.37 | bwd_allreduce_microstep: 72.78 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-07-31 10:24:54,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.11 | bwd_microstep: 5220.69 | bwd_inner_microstep: 4813.70 | bwd_allreduce_microstep: 406.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-31 10:25:03,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.87 | bwd_microstep: 5287.47 | bwd_inner_microstep: 4878.80 | bwd_allreduce_microstep: 408.61 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2149 [2024-07-31 10:25:11,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.48 | bwd_microstep: 5077.81 | bwd_inner_microstep: 4684.13 | bwd_allreduce_microstep: 393.61 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 10:25:20,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 10:25:20,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.42 | bwd_microstep: 5031.79 | bwd_inner_microstep: 4991.74 | bwd_allreduce_microstep: 39.98 | step_microstep: 182.41 [2024-07-31 10:25:20,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28634.56 | bwd: 41138.67 | bwd_inner: 39654.68 | bwd_allreduce: 1483.50 | step: 183.12 34%|███▍ | 418/1230 [8:13:26<15:48:21, 70.08s/it] {'loss': 1.1792, 'learning_rate': 1.5375155163499953e-05, 'epoch': 0.34} 34%|███▍ | 418/1230 [8:13:26<15:48:21, 70.08s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3903 [2024-07-31 10:25:29,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3853.79 | bwd_microstep: 5232.61 | bwd_inner_microstep: 5195.40 | bwd_allreduce_microstep: 37.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2261 [2024-07-31 10:25:38,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.94 | bwd_microstep: 5262.63 | bwd_inner_microstep: 4852.78 | bwd_allreduce_microstep: 409.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3772 [2024-07-31 10:25:47,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.46 | bwd_microstep: 5111.87 | bwd_inner_microstep: 5082.56 | bwd_allreduce_microstep: 29.23 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2099 [2024-07-31 10:25:56,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.64 | bwd_microstep: 5267.60 | bwd_inner_microstep: 4859.18 | bwd_allreduce_microstep: 408.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 10:26:05,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.34 | bwd_microstep: 5035.30 | bwd_inner_microstep: 5007.65 | bwd_allreduce_microstep: 27.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 10:26:14,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.91 | bwd_microstep: 5026.62 | bwd_inner_microstep: 5000.52 | bwd_allreduce_microstep: 26.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2143 [2024-07-31 10:26:22,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.31 | bwd_microstep: 5157.09 | bwd_inner_microstep: 4756.38 | bwd_allreduce_microstep: 400.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-31 10:26:31,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 10:26:31,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.98 | bwd_microstep: 4875.62 | bwd_inner_microstep: 4856.20 | bwd_allreduce_microstep: 19.35 | step_microstep: 182.34 [2024-07-31 10:26:31,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29394.27 | bwd: 40969.30 | bwd_inner: 39610.61 | bwd_allreduce: 1358.20 | step: 182.91 34%|███▍ | 419/1230 [8:14:37<15:49:43, 70.26s/it] {'loss': 1.171, 'learning_rate': 1.5352930683387502e-05, 'epoch': 0.34} 34%|███▍ | 419/1230 [8:14:37<15:49:43, 70.26s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 10:26:40,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.94 | bwd_microstep: 5435.30 | bwd_inner_microstep: 5401.39 | bwd_allreduce_microstep: 33.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2268 [2024-07-31 10:26:48,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3292.37 | bwd_microstep: 5014.32 | bwd_inner_microstep: 4624.82 | bwd_allreduce_microstep: 389.43 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2840 [2024-07-31 10:26:57,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.38 | bwd_microstep: 5158.38 | bwd_inner_microstep: 4755.83 | bwd_allreduce_microstep: 402.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3598 [2024-07-31 10:27:06,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.13 | bwd_microstep: 5162.67 | bwd_inner_microstep: 5078.84 | bwd_allreduce_microstep: 83.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3765 [2024-07-31 10:27:15,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.83 | bwd_microstep: 5000.16 | bwd_inner_microstep: 4980.79 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-31 10:27:23,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.56 | bwd_microstep: 4900.24 | bwd_inner_microstep: 4880.86 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-31 10:27:32,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.51 | bwd_microstep: 4869.11 | bwd_inner_microstep: 4849.69 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2120 [2024-07-31 10:27:41,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 10:27:41,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.94 | bwd_microstep: 5069.34 | bwd_inner_microstep: 4675.65 | bwd_allreduce_microstep: 393.63 | step_microstep: 181.94 [2024-07-31 10:27:41,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28771.55 | bwd: 40609.51 | bwd_inner: 39247.81 | bwd_allreduce: 1361.22 | step: 182.52 34%|███▍ | 420/1230 [8:15:47<15:46:20, 70.10s/it] {'loss': 1.1415, 'learning_rate': 1.533066908308196e-05, 'epoch': 0.34} 34%|███▍ | 420/1230 [8:15:47<15:46:20, 70.10s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3502 [2024-07-31 10:27:50,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.76 | bwd_microstep: 5579.55 | bwd_inner_microstep: 5309.05 | bwd_allreduce_microstep: 270.43 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3592 [2024-07-31 10:27:59,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.50 | bwd_microstep: 5151.35 | bwd_inner_microstep: 5051.73 | bwd_allreduce_microstep: 99.55 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3742 [2024-07-31 10:28:08,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.25 | bwd_microstep: 5166.20 | bwd_inner_microstep: 5095.21 | bwd_allreduce_microstep: 70.92 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-31 10:28:15,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3022.50 | bwd_microstep: 4895.80 | bwd_inner_microstep: 4520.88 | bwd_allreduce_microstep: 374.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3664 [2024-07-31 10:28:24,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.65 | bwd_microstep: 5030.95 | bwd_inner_microstep: 4989.96 | bwd_allreduce_microstep: 40.91 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-31 10:28:33,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.76 | bwd_microstep: 5040.11 | bwd_inner_microstep: 5017.65 | bwd_allreduce_microstep: 22.39 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2171 [2024-07-31 10:28:42,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.64 | bwd_microstep: 5196.33 | bwd_inner_microstep: 4792.38 | bwd_allreduce_microstep: 403.88 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2171 [2024-07-31 10:28:51,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 10:28:51,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.53 | bwd_microstep: 5132.97 | bwd_inner_microstep: 4735.24 | bwd_allreduce_microstep: 397.66 | step_microstep: 181.47 [2024-07-31 10:28:51,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28461.49 | bwd: 41193.24 | bwd_inner: 39512.05 | bwd_allreduce: 1680.70 | step: 182.17 34%|███▍ | 421/1230 [8:16:57<15:44:43, 70.07s/it] {'loss': 1.2402, 'learning_rate': 1.5308370516957617e-05, 'epoch': 0.34} 34%|███▍ | 421/1230 [8:16:57<15:44:43, 70.07s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4020 [2024-07-31 10:29:00,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3840.30 | bwd_microstep: 5283.36 | bwd_inner_microstep: 5264.33 | bwd_allreduce_microstep: 18.96 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 10:29:09,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.18 | bwd_microstep: 5216.84 | bwd_inner_microstep: 5128.48 | bwd_allreduce_microstep: 88.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 10:29:17,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.76 | bwd_microstep: 5128.71 | bwd_inner_microstep: 5053.24 | bwd_allreduce_microstep: 75.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-31 10:29:26,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.51 | bwd_microstep: 5170.12 | bwd_inner_microstep: 5092.09 | bwd_allreduce_microstep: 77.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 10:29:35,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.59 | bwd_microstep: 5246.36 | bwd_inner_microstep: 4840.17 | bwd_allreduce_microstep: 406.12 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2197 [2024-07-31 10:29:44,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.92 | bwd_microstep: 5179.52 | bwd_inner_microstep: 4778.14 | bwd_allreduce_microstep: 401.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 10:29:53,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.13 | bwd_microstep: 5163.72 | bwd_inner_microstep: 5086.19 | bwd_allreduce_microstep: 77.46 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 10:30:02,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 10:30:02,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.86 | bwd_microstep: 5039.81 | bwd_inner_microstep: 4997.99 | bwd_allreduce_microstep: 41.75 | step_microstep: 181.28 [2024-07-31 10:30:02,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29118.14 | bwd: 41428.41 | bwd_inner: 40240.56 | bwd_allreduce: 1187.36 | step: 181.88 34%|███▍ | 422/1230 [8:18:07<15:46:51, 70.31s/it] {'loss': 1.1745, 'learning_rate': 1.528603513964511e-05, 'epoch': 0.34} 34%|███▍ | 422/1230 [8:18:07<15:46:51, 70.31s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4038 [2024-07-31 10:30:11,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.52 | bwd_microstep: 5599.58 | bwd_inner_microstep: 5535.37 | bwd_allreduce_microstep: 64.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3837 [2024-07-31 10:30:20,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.46 | bwd_microstep: 5360.42 | bwd_inner_microstep: 5292.09 | bwd_allreduce_microstep: 68.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-31 10:30:29,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.48 | bwd_microstep: 5038.61 | bwd_inner_microstep: 5019.17 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3806 [2024-07-31 10:30:38,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.82 | bwd_microstep: 5077.68 | bwd_inner_microstep: 5051.10 | bwd_allreduce_microstep: 26.52 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2073 [2024-07-31 10:30:46,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.18 | bwd_microstep: 5234.90 | bwd_inner_microstep: 4830.57 | bwd_allreduce_microstep: 404.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3701 [2024-07-31 10:30:55,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.36 | bwd_microstep: 5072.67 | bwd_inner_microstep: 4998.29 | bwd_allreduce_microstep: 74.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 10:31:03,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3206.46 | bwd_microstep: 4717.62 | bwd_inner_microstep: 4695.24 | bwd_allreduce_microstep: 22.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 10:31:12,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 10:31:12,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.35 | bwd_microstep: 5058.00 | bwd_inner_microstep: 4997.83 | bwd_allreduce_microstep: 60.10 | step_microstep: 180.92 [2024-07-31 10:31:12,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28728.54 | bwd: 41159.47 | bwd_inner: 40419.60 | bwd_allreduce: 739.39 | step: 181.51 34%|███▍ | 423/1230 [8:19:18<15:45:18, 70.28s/it] {'loss': 1.1877, 'learning_rate': 1.5263663106030347e-05, 'epoch': 0.34} 34%|███▍ | 423/1230 [8:19:18<15:45:18, 70.28s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2286 [2024-07-31 10:31:21,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.79 | bwd_microstep: 5459.48 | bwd_inner_microstep: 5041.63 | bwd_allreduce_microstep: 417.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3870 [2024-07-31 10:31:30,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.52 | bwd_microstep: 5291.28 | bwd_inner_microstep: 5232.75 | bwd_allreduce_microstep: 58.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3728 [2024-07-31 10:31:39,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.46 | bwd_microstep: 5143.99 | bwd_inner_microstep: 5066.82 | bwd_allreduce_microstep: 77.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-31 10:31:47,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.32 | bwd_microstep: 5205.96 | bwd_inner_microstep: 4800.91 | bwd_allreduce_microstep: 404.98 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-31 10:31:56,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.09 | bwd_microstep: 5130.59 | bwd_inner_microstep: 4733.92 | bwd_allreduce_microstep: 396.61 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3700 [2024-07-31 10:32:05,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.06 | bwd_microstep: 5044.74 | bwd_inner_microstep: 4973.45 | bwd_allreduce_microstep: 71.22 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 10:32:13,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.41 | bwd_microstep: 4987.28 | bwd_inner_microstep: 4933.88 | bwd_allreduce_microstep: 53.33 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2159 [2024-07-31 10:32:22,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 10:32:22,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.82 | bwd_microstep: 5242.59 | bwd_inner_microstep: 4836.30 | bwd_allreduce_microstep: 406.22 | step_microstep: 181.64 [2024-07-31 10:32:22,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28602.36 | bwd: 41505.90 | bwd_inner: 39619.60 | bwd_allreduce: 1885.81 | step: 182.23 34%|███▍ | 424/1230 [8:20:28<15:44:47, 70.33s/it] {'loss': 1.1788, 'learning_rate': 1.5241254571253433e-05, 'epoch': 0.34} 34%|███▍ | 424/1230 [8:20:28<15:44:47, 70.33s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3810 [2024-07-31 10:32:31,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3815.71 | bwd_microstep: 5343.55 | bwd_inner_microstep: 5286.74 | bwd_allreduce_microstep: 56.75 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3583 [2024-07-31 10:32:40,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.71 | bwd_microstep: 5346.06 | bwd_inner_microstep: 5195.33 | bwd_allreduce_microstep: 150.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-07-31 10:32:49,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.61 | bwd_microstep: 5011.78 | bwd_inner_microstep: 4992.42 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3750 [2024-07-31 10:32:58,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.41 | bwd_microstep: 5192.05 | bwd_inner_microstep: 5137.66 | bwd_allreduce_microstep: 54.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 10:33:07,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.37 | bwd_microstep: 5212.84 | bwd_inner_microstep: 5122.89 | bwd_allreduce_microstep: 89.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-31 10:33:15,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.01 | bwd_microstep: 5091.81 | bwd_inner_microstep: 4698.12 | bwd_allreduce_microstep: 393.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 10:33:24,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.74 | bwd_microstep: 5169.38 | bwd_inner_microstep: 5114.96 | bwd_allreduce_microstep: 54.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 10:33:33,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 10:33:33,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.03 | bwd_microstep: 4924.29 | bwd_inner_microstep: 4899.30 | bwd_allreduce_microstep: 24.92 | step_microstep: 182.12 [2024-07-31 10:33:33,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29224.50 | bwd: 41291.73 | bwd_inner: 40447.36 | bwd_allreduce: 843.88 | step: 182.72 35%|███▍ | 425/1230 [8:21:39<15:45:44, 70.49s/it] {'loss': 1.2046, 'learning_rate': 1.5218809690707583e-05, 'epoch': 0.35} 35%|███▍ | 425/1230 [8:21:39<15:45:44, 70.49s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3993 [2024-07-31 10:33:42,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.21 | bwd_microstep: 5470.30 | bwd_inner_microstep: 5412.43 | bwd_allreduce_microstep: 57.80 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3843 [2024-07-31 10:33:51,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3835.05 | bwd_microstep: 5170.51 | bwd_inner_microstep: 5141.29 | bwd_allreduce_microstep: 29.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3811 [2024-07-31 10:34:00,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.85 | bwd_microstep: 5046.60 | bwd_inner_microstep: 5027.26 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-31 10:34:09,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3378.32 | bwd_microstep: 5154.85 | bwd_inner_microstep: 5080.90 | bwd_allreduce_microstep: 73.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-31 10:34:18,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.56 | bwd_microstep: 5267.22 | bwd_inner_microstep: 5172.51 | bwd_allreduce_microstep: 94.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-31 10:34:26,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.41 | bwd_microstep: 5054.71 | bwd_inner_microstep: 5027.64 | bwd_allreduce_microstep: 27.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 10:34:35,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.82 | bwd_microstep: 5061.33 | bwd_inner_microstep: 5000.10 | bwd_allreduce_microstep: 61.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2138 [2024-07-31 10:34:43,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 10:34:43,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3023.14 | bwd_microstep: 4897.69 | bwd_inner_microstep: 4522.91 | bwd_allreduce_microstep: 374.72 | step_microstep: 182.11 [2024-07-31 10:34:43,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28700.26 | bwd: 41123.20 | bwd_inner: 40384.97 | bwd_allreduce: 737.75 | step: 182.69 35%|███▍ | 426/1230 [8:22:49<15:43:14, 70.39s/it] {'loss': 1.2149, 'learning_rate': 1.5196328620038059e-05, 'epoch': 0.35} 35%|███▍ | 426/1230 [8:22:49<15:43:14, 70.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 10:34:53,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.48 | bwd_microstep: 5587.00 | bwd_inner_microstep: 5527.17 | bwd_allreduce_microstep: 59.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3879 [2024-07-31 10:35:02,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.94 | bwd_microstep: 5287.20 | bwd_inner_microstep: 5228.58 | bwd_allreduce_microstep: 58.55 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2265 [2024-07-31 10:35:10,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3003.22 | bwd_microstep: 4950.85 | bwd_inner_microstep: 4566.82 | bwd_allreduce_microstep: 383.97 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-07-31 10:35:18,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.58 | bwd_microstep: 5061.34 | bwd_inner_microstep: 5034.35 | bwd_allreduce_microstep: 26.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 10:35:27,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.88 | bwd_microstep: 5018.21 | bwd_inner_microstep: 4995.65 | bwd_allreduce_microstep: 22.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 10:35:36,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.47 | bwd_microstep: 5060.46 | bwd_inner_microstep: 4997.57 | bwd_allreduce_microstep: 62.82 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3639 [2024-07-31 10:35:44,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.04 | bwd_microstep: 4995.71 | bwd_inner_microstep: 4934.30 | bwd_allreduce_microstep: 61.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 10:35:53,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 10:35:53,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.57 | bwd_microstep: 5009.67 | bwd_inner_microstep: 4961.53 | bwd_allreduce_microstep: 48.07 | step_microstep: 182.64 [2024-07-31 10:35:53,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28636.07 | bwd: 40970.42 | bwd_inner: 40245.91 | bwd_allreduce: 724.03 | step: 183.24 35%|███▍ | 427/1230 [8:23:59<15:40:15, 70.26s/it] {'loss': 1.1763, 'learning_rate': 1.5173811515141088e-05, 'epoch': 0.35} 35%|███▍ | 427/1230 [8:23:59<15:40:15, 70.26s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3976 [2024-07-31 10:36:03,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3868.37 | bwd_microstep: 5466.28 | bwd_inner_microstep: 5424.96 | bwd_allreduce_microstep: 41.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2298 [2024-07-31 10:36:11,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.19 | bwd_microstep: 5262.83 | bwd_inner_microstep: 4854.80 | bwd_allreduce_microstep: 407.96 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2236 [2024-07-31 10:36:20,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.09 | bwd_microstep: 5177.03 | bwd_inner_microstep: 4775.43 | bwd_allreduce_microstep: 401.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-31 10:36:28,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3206.20 | bwd_microstep: 4719.87 | bwd_inner_microstep: 4692.89 | bwd_allreduce_microstep: 26.91 | step_microstep: 0.20 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 10:36:37,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.38 | bwd_microstep: 5263.19 | bwd_inner_microstep: 4855.11 | bwd_allreduce_microstep: 408.00 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2172 [2024-07-31 10:36:46,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.86 | bwd_microstep: 5105.94 | bwd_inner_microstep: 4710.69 | bwd_allreduce_microstep: 395.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 10:36:53,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3171.75 | bwd_microstep: 4684.44 | bwd_inner_microstep: 4664.98 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 10:37:02,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.81 [2024-07-31 10:37:02,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.61 | bwd_microstep: 4911.98 | bwd_inner_microstep: 4892.67 | bwd_allreduce_microstep: 19.24 | step_microstep: 181.86 [2024-07-31 10:37:02,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28115.35 | bwd: 40591.54 | bwd_inner: 38871.47 | bwd_allreduce: 1719.58 | step: 182.56 35%|███▍ | 428/1230 [8:25:08<15:34:11, 69.89s/it] {'loss': 1.2053, 'learning_rate': 1.5151258532162771e-05, 'epoch': 0.35} 35%|███▍ | 428/1230 [8:25:08<15:34:11, 69.89s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2375 [2024-07-31 10:37:11,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.18 | bwd_microstep: 5579.27 | bwd_inner_microstep: 5150.37 | bwd_allreduce_microstep: 428.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2275 [2024-07-31 10:37:20,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.33 | bwd_microstep: 5282.92 | bwd_inner_microstep: 4874.58 | bwd_allreduce_microstep: 408.28 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 10:37:29,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.16 | bwd_microstep: 5032.97 | bwd_inner_microstep: 5005.82 | bwd_allreduce_microstep: 27.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-31 10:37:38,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.43 | bwd_microstep: 5027.61 | bwd_inner_microstep: 5001.57 | bwd_allreduce_microstep: 25.97 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3632 [2024-07-31 10:37:47,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.27 | bwd_microstep: 5050.49 | bwd_inner_microstep: 5001.92 | bwd_allreduce_microstep: 48.50 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2180 [2024-07-31 10:37:55,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3011.76 | bwd_microstep: 4881.33 | bwd_inner_microstep: 4506.56 | bwd_allreduce_microstep: 374.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 10:38:03,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.21 | bwd_microstep: 5118.48 | bwd_inner_microstep: 4721.92 | bwd_allreduce_microstep: 396.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3682 [2024-07-31 10:38:12,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.85 [2024-07-31 10:38:12,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.85 | bwd_microstep: 4880.57 | bwd_inner_microstep: 4861.26 | bwd_allreduce_microstep: 19.23 | step_microstep: 181.86 [2024-07-31 10:38:12,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28567.10 | bwd: 40853.63 | bwd_inner: 39123.95 | bwd_allreduce: 1729.19 | step: 182.44 35%|███▍ | 429/1230 [8:26:18<15:32:28, 69.85s/it] {'loss': 1.2089, 'learning_rate': 1.5128669827498024e-05, 'epoch': 0.35} 35%|███▍ | 429/1230 [8:26:18<15:32:28, 69.85s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3556 [2024-07-31 10:38:21,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.80 | bwd_microstep: 5460.86 | bwd_inner_microstep: 5281.64 | bwd_allreduce_microstep: 179.15 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2298 [2024-07-31 10:38:30,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.80 | bwd_microstep: 5217.81 | bwd_inner_microstep: 4812.65 | bwd_allreduce_microstep: 405.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2253 [2024-07-31 10:38:39,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.21 | bwd_microstep: 5189.83 | bwd_inner_microstep: 4787.43 | bwd_allreduce_microstep: 402.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3819 [2024-07-31 10:38:47,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.06 | bwd_microstep: 5181.79 | bwd_inner_microstep: 5133.28 | bwd_allreduce_microstep: 48.45 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3651 [2024-07-31 10:38:56,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3109.45 | bwd_microstep: 4977.65 | bwd_inner_microstep: 4915.87 | bwd_allreduce_microstep: 61.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 10:39:04,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.15 | bwd_microstep: 5145.01 | bwd_inner_microstep: 5091.86 | bwd_allreduce_microstep: 53.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 10:39:13,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.37 | bwd_microstep: 4977.63 | bwd_inner_microstep: 4943.19 | bwd_allreduce_microstep: 34.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3674 [2024-07-31 10:39:22,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 10:39:22,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.59 | bwd_microstep: 4878.92 | bwd_inner_microstep: 4858.22 | bwd_allreduce_microstep: 20.63 | step_microstep: 182.23 [2024-07-31 10:39:22,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28522.33 | bwd: 41029.48 | bwd_inner: 39824.09 | bwd_allreduce: 1204.92 | step: 182.82 35%|███▍ | 430/1230 [8:27:28<15:31:27, 69.86s/it] {'loss': 1.212, 'learning_rate': 1.5106045557789453e-05, 'epoch': 0.35} 35%|███▍ | 430/1230 [8:27:28<15:31:27, 69.86s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3951 [2024-07-31 10:39:31,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.07 | bwd_microstep: 5316.53 | bwd_inner_microstep: 5262.78 | bwd_allreduce_microstep: 53.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 10:39:40,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.38 | bwd_microstep: 5177.23 | bwd_inner_microstep: 5123.99 | bwd_allreduce_microstep: 53.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-31 10:39:48,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.98 | bwd_microstep: 5151.94 | bwd_inner_microstep: 5101.25 | bwd_allreduce_microstep: 50.62 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3113 [2024-07-31 10:39:57,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.52 | bwd_microstep: 5150.16 | bwd_inner_microstep: 4888.48 | bwd_allreduce_microstep: 261.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 10:40:06,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.66 | bwd_microstep: 5148.52 | bwd_inner_microstep: 5095.38 | bwd_allreduce_microstep: 53.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 10:40:15,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.44 | bwd_microstep: 5054.50 | bwd_inner_microstep: 4991.38 | bwd_allreduce_microstep: 63.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 10:40:23,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3658.71 | bwd_microstep: 4885.78 | bwd_inner_microstep: 4866.32 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2130 [2024-07-31 10:40:32,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 10:40:32,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.49 | bwd_microstep: 5177.08 | bwd_inner_microstep: 4773.91 | bwd_allreduce_microstep: 403.10 | step_microstep: 181.54 [2024-07-31 10:40:32,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28805.17 | bwd: 41061.72 | bwd_inner: 40103.43 | bwd_allreduce: 957.81 | step: 182.12 35%|███▌ | 431/1230 [8:28:38<15:31:38, 69.96s/it] {'loss': 1.1271, 'learning_rate': 1.5083385879926314e-05, 'epoch': 0.35} 35%|███▌ | 431/1230 [8:28:38<15:31:38, 69.96s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3986 [2024-07-31 10:40:41,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.50 | bwd_microstep: 5589.75 | bwd_inner_microstep: 5500.72 | bwd_allreduce_microstep: 88.96 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3908 [2024-07-31 10:40:50,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.31 | bwd_microstep: 5384.21 | bwd_inner_microstep: 5316.77 | bwd_allreduce_microstep: 67.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-31 10:40:59,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.83 | bwd_microstep: 5246.64 | bwd_inner_microstep: 5157.75 | bwd_allreduce_microstep: 88.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 10:41:08,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.93 | bwd_microstep: 4980.21 | bwd_inner_microstep: 4942.92 | bwd_allreduce_microstep: 37.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3753 [2024-07-31 10:41:17,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.79 | bwd_microstep: 4961.00 | bwd_inner_microstep: 4914.91 | bwd_allreduce_microstep: 46.02 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 10:41:25,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.94 | bwd_microstep: 5019.16 | bwd_inner_microstep: 4994.43 | bwd_allreduce_microstep: 24.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 10:41:34,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.13 | bwd_microstep: 5113.29 | bwd_inner_microstep: 4716.33 | bwd_allreduce_microstep: 396.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 10:41:43,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 10:41:43,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.76 | bwd_microstep: 5007.20 | bwd_inner_microstep: 4953.59 | bwd_allreduce_microstep: 53.54 | step_microstep: 183.41 [2024-07-31 10:41:43,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29147.09 | bwd: 41301.44 | bwd_inner: 40497.36 | bwd_allreduce: 803.59 | step: 184.01 35%|███▌ | 432/1230 [8:29:49<15:33:45, 70.21s/it] {'loss': 1.1759, 'learning_rate': 1.5060690951043385e-05, 'epoch': 0.35} 35%|███▌ | 432/1230 [8:29:49<15:33:45, 70.21s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 10:41:52,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3867.81 | bwd_microstep: 5752.79 | bwd_inner_microstep: 5733.69 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3652 [2024-07-31 10:42:02,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3781.29 | bwd_microstep: 5397.11 | bwd_inner_microstep: 5307.93 | bwd_allreduce_microstep: 89.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-31 10:42:10,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.70 | bwd_microstep: 5158.73 | bwd_inner_microstep: 5105.86 | bwd_allreduce_microstep: 52.80 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 10:42:19,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.70 | bwd_microstep: 5108.62 | bwd_inner_microstep: 5035.79 | bwd_allreduce_microstep: 72.77 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2188 [2024-07-31 10:42:28,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3439.81 | bwd_microstep: 5003.77 | bwd_inner_microstep: 4615.65 | bwd_allreduce_microstep: 388.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 10:42:37,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.76 | bwd_microstep: 5133.90 | bwd_inner_microstep: 5082.29 | bwd_allreduce_microstep: 51.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 10:42:45,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.55 | bwd_microstep: 5037.68 | bwd_inner_microstep: 4647.35 | bwd_allreduce_microstep: 390.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 10:42:54,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.77 [2024-07-31 10:42:54,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.25 | bwd_microstep: 4887.21 | bwd_inner_microstep: 4867.80 | bwd_allreduce_microstep: 19.32 | step_microstep: 183.41 [2024-07-31 10:42:54,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29178.78 | bwd: 41479.79 | bwd_inner: 40396.30 | bwd_allreduce: 1082.99 | step: 184.12 35%|███▌ | 433/1230 [8:31:00<15:35:43, 70.44s/it] {'loss': 1.1752, 'learning_rate': 1.5037960928519902e-05, 'epoch': 0.35} 35%|███▌ | 433/1230 [8:31:00<15:35:43, 70.44s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4064 [2024-07-31 10:43:03,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.43 | bwd_microstep: 5148.49 | bwd_inner_microstep: 5129.31 | bwd_allreduce_microstep: 19.12 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2025 [2024-07-31 10:43:12,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.98 | bwd_microstep: 5222.12 | bwd_inner_microstep: 4817.82 | bwd_allreduce_microstep: 404.24 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-31 10:43:20,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.46 | bwd_microstep: 5196.60 | bwd_inner_microstep: 5117.95 | bwd_allreduce_microstep: 78.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-31 10:43:29,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.28 | bwd_microstep: 4985.59 | bwd_inner_microstep: 4966.21 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-31 10:43:38,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.15 | bwd_microstep: 4996.03 | bwd_inner_microstep: 4938.47 | bwd_allreduce_microstep: 57.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 10:43:46,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.03 | bwd_microstep: 4890.36 | bwd_inner_microstep: 4870.80 | bwd_allreduce_microstep: 19.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 10:43:55,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.53 | bwd_microstep: 5190.91 | bwd_inner_microstep: 5105.74 | bwd_allreduce_microstep: 85.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 10:44:04,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 10:44:04,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.38 | bwd_microstep: 5176.79 | bwd_inner_microstep: 5092.66 | bwd_allreduce_microstep: 84.06 | step_microstep: 181.67 [2024-07-31 10:44:04,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29086.16 | bwd: 40806.87 | bwd_inner: 40038.91 | bwd_allreduce: 767.47 | step: 182.28 35%|███▌ | 434/1230 [8:32:10<15:33:41, 70.38s/it] {'loss': 1.1923, 'learning_rate': 1.501519596997847e-05, 'epoch': 0.35} 35%|███▌ | 434/1230 [8:32:10<15:33:41, 70.38s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3826 [2024-07-31 10:44:13,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.16 | bwd_microstep: 5607.59 | bwd_inner_microstep: 5509.21 | bwd_allreduce_microstep: 98.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4068 [2024-07-31 10:44:22,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.76 | bwd_microstep: 5197.65 | bwd_inner_microstep: 5178.34 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3866 [2024-07-31 10:44:31,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3791.85 | bwd_microstep: 5114.37 | bwd_inner_microstep: 5095.05 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-31 10:44:40,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.93 | bwd_microstep: 5210.40 | bwd_inner_microstep: 4803.41 | bwd_allreduce_microstep: 406.92 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3719 [2024-07-31 10:44:49,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.31 | bwd_microstep: 5147.57 | bwd_inner_microstep: 5076.66 | bwd_allreduce_microstep: 70.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3769 [2024-07-31 10:44:58,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.92 | bwd_microstep: 5015.42 | bwd_inner_microstep: 4996.05 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 10:45:06,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.87 | bwd_microstep: 4916.03 | bwd_inner_microstep: 4896.74 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2176 [2024-07-31 10:45:15,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 10:45:15,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3446.04 | bwd_microstep: 5023.26 | bwd_inner_microstep: 4633.58 | bwd_allreduce_microstep: 389.61 | step_microstep: 181.38 [2024-07-31 10:45:15,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29237.74 | bwd: 41232.27 | bwd_inner: 40188.98 | bwd_allreduce: 1042.79 | step: 181.96 35%|███▌ | 435/1230 [8:33:21<15:34:12, 70.51s/it] {'loss': 1.1593, 'learning_rate': 1.499239623328394e-05, 'epoch': 0.35} 35%|███▌ | 435/1230 [8:33:21<15:34:12, 70.51s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4064 [2024-07-31 10:45:24,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.38 | bwd_microstep: 5588.35 | bwd_inner_microstep: 5522.46 | bwd_allreduce_microstep: 65.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3815 [2024-07-31 10:45:33,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.83 | bwd_microstep: 5103.50 | bwd_inner_microstep: 5077.35 | bwd_allreduce_microstep: 26.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3802 [2024-07-31 10:45:42,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.14 | bwd_microstep: 5032.55 | bwd_inner_microstep: 4999.30 | bwd_allreduce_microstep: 33.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3779 [2024-07-31 10:45:51,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.89 | bwd_microstep: 5039.27 | bwd_inner_microstep: 5019.96 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2224 [2024-07-31 10:45:59,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.17 | bwd_microstep: 5303.13 | bwd_inner_microstep: 4892.74 | bwd_allreduce_microstep: 410.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-31 10:46:08,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.15 | bwd_microstep: 4982.99 | bwd_inner_microstep: 4963.68 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-31 10:46:16,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.60 | bwd_microstep: 4799.64 | bwd_inner_microstep: 4780.25 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3638 [2024-07-31 10:46:25,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 10:46:25,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.95 | bwd_microstep: 5051.51 | bwd_inner_microstep: 4977.02 | bwd_allreduce_microstep: 74.43 | step_microstep: 182.04 [2024-07-31 10:46:25,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28968.01 | bwd: 40900.92 | bwd_inner: 40232.71 | bwd_allreduce: 667.71 | step: 182.62 35%|███▌ | 436/1230 [8:34:31<15:31:49, 70.42s/it] {'loss': 1.1768, 'learning_rate': 1.4969561876542348e-05, 'epoch': 0.35} 35%|███▌ | 436/1230 [8:34:31<15:31:49, 70.42s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2273 [2024-07-31 10:46:34,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.52 | bwd_microstep: 5309.59 | bwd_inner_microstep: 4899.27 | bwd_allreduce_microstep: 410.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-07-31 10:46:42,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.41 | bwd_microstep: 5094.49 | bwd_inner_microstep: 5019.03 | bwd_allreduce_microstep: 75.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3779 [2024-07-31 10:46:51,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.86 | bwd_microstep: 5020.08 | bwd_inner_microstep: 5000.71 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-31 10:47:00,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.13 | bwd_microstep: 5227.29 | bwd_inner_microstep: 4820.84 | bwd_allreduce_microstep: 406.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-07-31 10:47:09,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.75 | bwd_microstep: 4992.93 | bwd_inner_microstep: 4973.64 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-31 10:47:17,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3228.03 | bwd_microstep: 4803.91 | bwd_inner_microstep: 4784.51 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 10:47:25,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.73 | bwd_microstep: 5071.50 | bwd_inner_microstep: 4678.17 | bwd_allreduce_microstep: 393.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 10:47:34,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 10:47:34,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.28 | bwd_microstep: 4945.85 | bwd_inner_microstep: 4921.14 | bwd_allreduce_microstep: 24.64 | step_microstep: 181.36 [2024-07-31 10:47:34,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28236.63 | bwd: 40465.61 | bwd_inner: 39097.24 | bwd_allreduce: 1367.86 | step: 181.94 36%|███▌ | 437/1230 [8:35:40<15:25:10, 70.00s/it] {'loss': 1.2081, 'learning_rate': 1.4946693058099802e-05, 'epoch': 0.36} 36%|███▌ | 437/1230 [8:35:40<15:25:10, 70.00s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 10:47:43,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3844.83 | bwd_microstep: 5336.47 | bwd_inner_microstep: 5317.41 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3951 [2024-07-31 10:47:52,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.98 | bwd_microstep: 5188.88 | bwd_inner_microstep: 5169.52 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 10:48:01,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.80 | bwd_microstep: 5130.64 | bwd_inner_microstep: 5058.07 | bwd_allreduce_microstep: 72.50 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3765 [2024-07-31 10:48:10,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.09 | bwd_microstep: 5095.90 | bwd_inner_microstep: 5053.06 | bwd_allreduce_microstep: 42.78 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2971 [2024-07-31 10:48:19,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.30 | bwd_microstep: 5209.68 | bwd_inner_microstep: 4823.11 | bwd_allreduce_microstep: 386.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 10:48:27,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.39 | bwd_microstep: 5057.68 | bwd_inner_microstep: 4999.10 | bwd_allreduce_microstep: 58.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 10:48:36,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.63 | bwd_microstep: 5058.25 | bwd_inner_microstep: 5001.17 | bwd_allreduce_microstep: 57.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 10:48:45,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 10:48:45,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.07 | bwd_microstep: 4991.39 | bwd_inner_microstep: 4961.29 | bwd_allreduce_microstep: 30.03 | step_microstep: 181.37 [2024-07-31 10:48:45,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29269.02 | bwd: 41068.86 | bwd_inner: 40382.66 | bwd_allreduce: 685.72 | step: 182.05 36%|███▌ | 438/1230 [8:36:51<15:26:39, 70.20s/it] {'loss': 1.1572, 'learning_rate': 1.492378993654138e-05, 'epoch': 0.36} 36%|███▌ | 438/1230 [8:36:51<15:26:39, 70.20s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3925 [2024-07-31 10:48:54,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3809.14 | bwd_microstep: 5150.69 | bwd_inner_microstep: 5131.57 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2253 [2024-07-31 10:49:03,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.37 | bwd_microstep: 5328.37 | bwd_inner_microstep: 4913.51 | bwd_allreduce_microstep: 414.79 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2840 [2024-07-31 10:49:11,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.51 | bwd_microstep: 5177.62 | bwd_inner_microstep: 4772.68 | bwd_allreduce_microstep: 404.88 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-31 10:49:20,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.51 | bwd_microstep: 5247.03 | bwd_inner_microstep: 5151.76 | bwd_allreduce_microstep: 95.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-31 10:49:29,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.06 | bwd_microstep: 4989.54 | bwd_inner_microstep: 4970.23 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-07-31 10:49:38,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.87 | bwd_microstep: 5092.44 | bwd_inner_microstep: 4699.05 | bwd_allreduce_microstep: 393.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3728 [2024-07-31 10:49:46,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.80 | bwd_microstep: 5116.50 | bwd_inner_microstep: 5033.32 | bwd_allreduce_microstep: 83.12 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 10:49:55,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 10:49:55,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.33 | bwd_microstep: 5051.47 | bwd_inner_microstep: 4993.64 | bwd_allreduce_microstep: 57.76 | step_microstep: 181.32 [2024-07-31 10:49:55,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29026.49 | bwd: 41153.63 | bwd_inner: 39665.69 | bwd_allreduce: 1487.42 | step: 181.91 36%|███▌ | 439/1230 [8:38:01<15:26:43, 70.29s/it] {'loss': 1.162, 'learning_rate': 1.4900852670690044e-05, 'epoch': 0.36} 36%|███▌ | 439/1230 [8:38:01<15:26:43, 70.29s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3855 [2024-07-31 10:50:04,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.35 | bwd_microstep: 5155.60 | bwd_inner_microstep: 5132.00 | bwd_allreduce_microstep: 23.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-31 10:50:13,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.55 | bwd_microstep: 5179.44 | bwd_inner_microstep: 5094.82 | bwd_allreduce_microstep: 84.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3744 [2024-07-31 10:50:22,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.03 | bwd_microstep: 5177.41 | bwd_inner_microstep: 5093.05 | bwd_allreduce_microstep: 84.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-31 10:50:31,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.45 | bwd_microstep: 5171.48 | bwd_inner_microstep: 5092.30 | bwd_allreduce_microstep: 79.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-31 10:50:39,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.79 | bwd_microstep: 5168.08 | bwd_inner_microstep: 4767.02 | bwd_allreduce_microstep: 401.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-31 10:50:48,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.15 | bwd_microstep: 5058.42 | bwd_inner_microstep: 4664.51 | bwd_allreduce_microstep: 393.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-07-31 10:50:57,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.70 | bwd_microstep: 5070.86 | bwd_inner_microstep: 5023.32 | bwd_allreduce_microstep: 47.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 10:51:06,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 10:51:06,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.91 | bwd_microstep: 5068.48 | bwd_inner_microstep: 4676.09 | bwd_allreduce_microstep: 392.32 | step_microstep: 181.31 [2024-07-31 10:51:06,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28866.85 | bwd: 41049.76 | bwd_inner: 39543.04 | bwd_allreduce: 1506.23 | step: 181.90 36%|███▌ | 440/1230 [8:39:11<15:25:22, 70.28s/it] {'loss': 1.1952, 'learning_rate': 1.4877881419605531e-05, 'epoch': 0.36} 36%|███▌ | 440/1230 [8:39:11<15:25:22, 70.28s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-31 10:51:15,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.94 | bwd_microstep: 5469.40 | bwd_inner_microstep: 5378.56 | bwd_allreduce_microstep: 90.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2277 [2024-07-31 10:51:24,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.08 | bwd_microstep: 5442.83 | bwd_inner_microstep: 5024.38 | bwd_allreduce_microstep: 418.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-31 10:51:33,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.64 | bwd_microstep: 5209.50 | bwd_inner_microstep: 4808.37 | bwd_allreduce_microstep: 401.06 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3699 [2024-07-31 10:51:41,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.20 | bwd_microstep: 5121.41 | bwd_inner_microstep: 5034.47 | bwd_allreduce_microstep: 86.88 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 10:51:50,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.29 | bwd_microstep: 5111.80 | bwd_inner_microstep: 5047.31 | bwd_allreduce_microstep: 64.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 10:51:59,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.96 | bwd_microstep: 4882.51 | bwd_inner_microstep: 4863.15 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 10:52:07,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.29 | bwd_microstep: 5125.80 | bwd_inner_microstep: 5060.48 | bwd_allreduce_microstep: 65.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-31 10:52:16,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 10:52:16,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.22 | bwd_microstep: 5166.32 | bwd_inner_microstep: 5087.82 | bwd_allreduce_microstep: 78.44 | step_microstep: 182.68 [2024-07-31 10:52:16,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28866.54 | bwd: 41529.55 | bwd_inner: 40304.49 | bwd_allreduce: 1224.58 | step: 183.28 36%|███▌ | 441/1230 [8:40:22<15:25:57, 70.42s/it] {'loss': 1.1586, 'learning_rate': 1.4854876342583246e-05, 'epoch': 0.36} 36%|███▌ | 441/1230 [8:40:22<15:25:57, 70.42s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 4096 [2024-07-31 10:52:25,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.94 | bwd_microstep: 5305.07 | bwd_inner_microstep: 5265.00 | bwd_allreduce_microstep: 39.99 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3861 [2024-07-31 10:52:34,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.65 | bwd_microstep: 5069.08 | bwd_inner_microstep: 5034.35 | bwd_allreduce_microstep: 34.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-07-31 10:52:43,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.92 | bwd_microstep: 5238.83 | bwd_inner_microstep: 5195.72 | bwd_allreduce_microstep: 43.04 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2232 [2024-07-31 10:52:52,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.51 | bwd_microstep: 5258.02 | bwd_inner_microstep: 4850.23 | bwd_allreduce_microstep: 407.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 10:53:01,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.01 | bwd_microstep: 5247.90 | bwd_inner_microstep: 5160.37 | bwd_allreduce_microstep: 87.46 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-31 10:53:10,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.38 | bwd_microstep: 4982.95 | bwd_inner_microstep: 4963.55 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 10:53:18,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.73 | bwd_microstep: 4966.48 | bwd_inner_microstep: 4919.98 | bwd_allreduce_microstep: 46.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3666 [2024-07-31 10:53:27,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 10:53:27,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.49 | bwd_microstep: 5080.68 | bwd_inner_microstep: 4991.85 | bwd_allreduce_microstep: 88.77 | step_microstep: 182.36 [2024-07-31 10:53:27,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29146.53 | bwd: 41149.01 | bwd_inner: 40380.99 | bwd_allreduce: 767.51 | step: 183.09 36%|███▌ | 442/1230 [8:41:33<15:25:38, 70.48s/it] {'loss': 1.2054, 'learning_rate': 1.4831837599153167e-05, 'epoch': 0.36} 36%|███▌ | 442/1230 [8:41:33<15:25:38, 70.48s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 10:53:36,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3885.82 | bwd_microstep: 5361.41 | bwd_inner_microstep: 5342.37 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3775 [2024-07-31 10:53:45,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.65 | bwd_microstep: 5223.66 | bwd_inner_microstep: 5144.27 | bwd_allreduce_microstep: 79.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 10:53:54,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.87 | bwd_microstep: 5148.74 | bwd_inner_microstep: 5067.91 | bwd_allreduce_microstep: 80.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2220 [2024-07-31 10:54:02,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3056.71 | bwd_microstep: 5030.37 | bwd_inner_microstep: 4642.39 | bwd_allreduce_microstep: 387.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-31 10:54:11,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.71 | bwd_microstep: 5206.13 | bwd_inner_microstep: 5145.03 | bwd_allreduce_microstep: 61.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3682 [2024-07-31 10:54:19,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.99 | bwd_microstep: 4902.84 | bwd_inner_microstep: 4877.37 | bwd_allreduce_microstep: 25.40 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2113 [2024-07-31 10:54:27,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3021.47 | bwd_microstep: 4932.34 | bwd_inner_microstep: 4554.52 | bwd_allreduce_microstep: 377.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 10:54:36,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 10:54:36,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.17 | bwd_microstep: 4959.78 | bwd_inner_microstep: 4912.24 | bwd_allreduce_microstep: 47.47 | step_microstep: 182.24 [2024-07-31 10:54:36,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28048.29 | bwd: 40765.26 | bwd_inner: 39686.04 | bwd_allreduce: 1078.72 | step: 182.83 36%|███▌ | 443/1230 [8:42:42<15:19:13, 70.08s/it] {'loss': 1.1601, 'learning_rate': 1.4808765349078729e-05, 'epoch': 0.36} 36%|███▌ | 443/1230 [8:42:42<15:19:13, 70.08s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4083 [2024-07-31 10:54:45,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3844.48 | bwd_microstep: 5320.97 | bwd_inner_microstep: 5301.89 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3578 [2024-07-31 10:54:53,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3119.04 | bwd_microstep: 5044.28 | bwd_inner_microstep: 4975.34 | bwd_allreduce_microstep: 68.86 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3605 [2024-07-31 10:55:02,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.36 | bwd_microstep: 5164.03 | bwd_inner_microstep: 5077.31 | bwd_allreduce_microstep: 86.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 10:55:11,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.86 | bwd_microstep: 5029.72 | bwd_inner_microstep: 4992.76 | bwd_allreduce_microstep: 36.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-31 10:55:20,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.90 | bwd_microstep: 5144.04 | bwd_inner_microstep: 4741.56 | bwd_allreduce_microstep: 402.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 10:55:28,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.25 | bwd_microstep: 4979.02 | bwd_inner_microstep: 4959.66 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 10:55:37,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.72 | bwd_microstep: 5027.30 | bwd_inner_microstep: 4970.92 | bwd_allreduce_microstep: 56.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3787 [2024-07-31 10:55:46,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 10:55:46,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.87 | bwd_microstep: 4894.57 | bwd_inner_microstep: 4875.12 | bwd_allreduce_microstep: 19.37 | step_microstep: 182.00 [2024-07-31 10:55:46,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28690.40 | bwd: 40603.91 | bwd_inner: 39894.51 | bwd_allreduce: 708.90 | step: 182.58 36%|███▌ | 444/1230 [8:43:52<15:16:16, 69.94s/it] {'loss': 1.179, 'learning_rate': 1.4785659752355724e-05, 'epoch': 0.36} 36%|███▌ | 444/1230 [8:43:52<15:16:16, 69.94s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 10:55:55,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.63 | bwd_microstep: 5589.42 | bwd_inner_microstep: 5526.47 | bwd_allreduce_microstep: 62.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3559 [2024-07-31 10:56:04,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.52 | bwd_microstep: 5164.97 | bwd_inner_microstep: 5077.36 | bwd_allreduce_microstep: 87.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 10:56:13,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.02 | bwd_microstep: 4984.27 | bwd_inner_microstep: 4964.94 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2219 [2024-07-31 10:56:21,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3045.71 | bwd_microstep: 5002.18 | bwd_inner_microstep: 4616.44 | bwd_allreduce_microstep: 385.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3735 [2024-07-31 10:56:29,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.05 | bwd_microstep: 5172.21 | bwd_inner_microstep: 5088.21 | bwd_allreduce_microstep: 83.93 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3718 [2024-07-31 10:56:38,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.50 | bwd_microstep: 5174.50 | bwd_inner_microstep: 5100.43 | bwd_allreduce_microstep: 74.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 10:56:47,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.90 | bwd_microstep: 5166.16 | bwd_inner_microstep: 5094.74 | bwd_allreduce_microstep: 71.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3630 [2024-07-31 10:56:56,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 10:56:56,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.59 | bwd_microstep: 5184.28 | bwd_inner_microstep: 5089.83 | bwd_allreduce_microstep: 94.38 | step_microstep: 182.48 [2024-07-31 10:56:56,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28570.82 | bwd: 41437.98 | bwd_inner: 40558.36 | bwd_allreduce: 879.12 | step: 183.08 36%|███▌ | 445/1230 [8:45:02<15:16:39, 70.06s/it] {'loss': 1.1822, 'learning_rate': 1.4762520969211186e-05, 'epoch': 0.36} 36%|███▌ | 445/1230 [8:45:02<15:16:39, 70.06s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3859 [2024-07-31 10:57:05,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.07 | bwd_microstep: 5105.30 | bwd_inner_microstep: 5067.34 | bwd_allreduce_microstep: 37.90 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2046 [2024-07-31 10:57:13,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3310.03 | bwd_microstep: 5224.97 | bwd_inner_microstep: 4820.86 | bwd_allreduce_microstep: 404.04 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2232 [2024-07-31 10:57:22,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.06 | bwd_microstep: 5208.75 | bwd_inner_microstep: 4802.92 | bwd_allreduce_microstep: 405.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 10:57:31,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.40 | bwd_microstep: 5157.91 | bwd_inner_microstep: 5075.49 | bwd_allreduce_microstep: 82.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 10:57:40,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.46 | bwd_microstep: 5157.52 | bwd_inner_microstep: 5101.58 | bwd_allreduce_microstep: 55.87 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-31 10:57:48,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.28 | bwd_microstep: 5129.66 | bwd_inner_microstep: 4731.18 | bwd_allreduce_microstep: 398.41 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2122 [2024-07-31 10:57:57,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3446.66 | bwd_microstep: 5015.77 | bwd_inner_microstep: 4625.99 | bwd_allreduce_microstep: 389.68 | step_microstep: 0.19 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2142 [2024-07-31 10:58:06,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 10:58:06,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.32 | bwd_microstep: 5093.93 | bwd_inner_microstep: 4702.00 | bwd_allreduce_microstep: 391.86 | step_microstep: 181.28 [2024-07-31 10:58:06,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28239.17 | bwd: 41093.78 | bwd_inner: 38927.30 | bwd_allreduce: 2165.99 | step: 181.99 36%|███▋ | 446/1230 [8:46:12<15:13:55, 69.94s/it] {'loss': 1.173, 'learning_rate': 1.4739349160102285e-05, 'epoch': 0.36} 36%|███▋ | 446/1230 [8:46:12<15:13:55, 69.94s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 10:58:15,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3878.49 | bwd_microstep: 5375.32 | bwd_inner_microstep: 5356.23 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-31 10:58:24,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.82 | bwd_microstep: 5330.89 | bwd_inner_microstep: 4918.76 | bwd_allreduce_microstep: 412.06 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2091 [2024-07-31 10:58:33,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.82 | bwd_microstep: 5222.02 | bwd_inner_microstep: 4816.23 | bwd_allreduce_microstep: 405.73 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 10:58:41,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.65 | bwd_microstep: 5275.00 | bwd_inner_microstep: 4866.18 | bwd_allreduce_microstep: 408.75 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-31 10:58:49,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3025.64 | bwd_microstep: 4892.29 | bwd_inner_microstep: 4514.40 | bwd_allreduce_microstep: 377.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 10:58:58,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.02 | bwd_microstep: 5048.64 | bwd_inner_microstep: 4992.91 | bwd_allreduce_microstep: 55.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 10:59:07,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.04 | bwd_microstep: 5288.50 | bwd_inner_microstep: 5191.38 | bwd_allreduce_microstep: 97.06 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2122 [2024-07-31 10:59:16,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 10:59:16,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.01 | bwd_microstep: 5119.32 | bwd_inner_microstep: 4721.67 | bwd_allreduce_microstep: 397.59 | step_microstep: 181.76 [2024-07-31 10:59:16,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28287.39 | bwd: 41551.97 | bwd_inner: 39377.69 | bwd_allreduce: 2173.80 | step: 182.36 36%|███▋ | 447/1230 [8:47:22<15:13:38, 70.01s/it] {'loss': 1.2011, 'learning_rate': 1.471614448571521e-05, 'epoch': 0.36} 36%|███▋ | 447/1230 [8:47:22<15:13:38, 70.01s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2800 [2024-07-31 10:59:25,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.18 | bwd_microstep: 5367.98 | bwd_inner_microstep: 4953.71 | bwd_allreduce_microstep: 414.20 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2038 [2024-07-31 10:59:34,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.80 | bwd_microstep: 5239.50 | bwd_inner_microstep: 4832.04 | bwd_allreduce_microstep: 407.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 10:59:42,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.52 | bwd_microstep: 5166.92 | bwd_inner_microstep: 4766.79 | bwd_allreduce_microstep: 400.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3820 [2024-07-31 10:59:51,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.28 | bwd_microstep: 5142.76 | bwd_inner_microstep: 5096.41 | bwd_allreduce_microstep: 46.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 11:00:00,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.78 | bwd_microstep: 4960.20 | bwd_inner_microstep: 4927.22 | bwd_allreduce_microstep: 32.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3843 [2024-07-31 11:00:09,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.24 | bwd_microstep: 4983.14 | bwd_inner_microstep: 4959.67 | bwd_allreduce_microstep: 23.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 11:00:17,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.01 | bwd_microstep: 5129.76 | bwd_inner_microstep: 5049.30 | bwd_allreduce_microstep: 80.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 11:00:25,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-31 11:00:25,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.49 | bwd_microstep: 4725.92 | bwd_inner_microstep: 4702.01 | bwd_allreduce_microstep: 23.83 | step_microstep: 182.88 [2024-07-31 11:00:25,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28554.22 | bwd: 40716.15 | bwd_inner: 39287.07 | bwd_allreduce: 1428.57 | step: 183.46 36%|███▋ | 448/1230 [8:48:31<15:10:52, 69.89s/it] {'loss': 1.1721, 'learning_rate': 1.4692907106964051e-05, 'epoch': 0.36} 36%|███▋ | 448/1230 [8:48:31<15:10:52, 69.89s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3827 [2024-07-31 11:00:34,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3828.33 | bwd_microstep: 5162.18 | bwd_inner_microstep: 5129.18 | bwd_allreduce_microstep: 32.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2230 [2024-07-31 11:00:43,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.45 | bwd_microstep: 5199.03 | bwd_inner_microstep: 4794.42 | bwd_allreduce_microstep: 404.53 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2263 [2024-07-31 11:00:52,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.50 | bwd_microstep: 5367.12 | bwd_inner_microstep: 4951.58 | bwd_allreduce_microstep: 415.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-31 11:01:00,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3229.48 | bwd_microstep: 4854.82 | bwd_inner_microstep: 4806.96 | bwd_allreduce_microstep: 47.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 11:01:09,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.60 | bwd_microstep: 5035.43 | bwd_inner_microstep: 4972.35 | bwd_allreduce_microstep: 63.01 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 11:01:18,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.42 | bwd_microstep: 4992.08 | bwd_inner_microstep: 4958.85 | bwd_allreduce_microstep: 33.17 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3655 [2024-07-31 11:01:26,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.79 | bwd_microstep: 5177.60 | bwd_inner_microstep: 5086.61 | bwd_allreduce_microstep: 90.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 11:01:35,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-31 11:01:35,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.39 | bwd_microstep: 5061.44 | bwd_inner_microstep: 4995.28 | bwd_allreduce_microstep: 66.09 | step_microstep: 181.66 [2024-07-31 11:01:35,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28631.87 | bwd: 40849.67 | bwd_inner: 39695.16 | bwd_allreduce: 1154.01 | step: 182.25 37%|███▋ | 449/1230 [8:49:41<15:09:25, 69.87s/it] {'loss': 1.1786, 'learning_rate': 1.46696371849897e-05, 'epoch': 0.37} 37%|███▋ | 449/1230 [8:49:41<15:09:25, 69.87s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4096 [2024-07-31 11:01:44,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.68 | bwd_microstep: 5359.57 | bwd_inner_microstep: 5332.01 | bwd_allreduce_microstep: 27.49 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-07-31 11:01:53,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.05 | bwd_microstep: 5242.72 | bwd_inner_microstep: 5184.71 | bwd_allreduce_microstep: 57.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2296 [2024-07-31 11:02:02,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.10 | bwd_microstep: 5314.10 | bwd_inner_microstep: 4904.34 | bwd_allreduce_microstep: 409.69 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3822 [2024-07-31 11:02:11,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.92 | bwd_microstep: 5036.36 | bwd_inner_microstep: 5017.04 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2093 [2024-07-31 11:02:20,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.32 | bwd_microstep: 5332.65 | bwd_inner_microstep: 4918.73 | bwd_allreduce_microstep: 413.85 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-31 11:02:28,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3299.86 | bwd_microstep: 5125.42 | bwd_inner_microstep: 4726.36 | bwd_allreduce_microstep: 398.99 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2166 [2024-07-31 11:02:37,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.25 | bwd_microstep: 5089.02 | bwd_inner_microstep: 4695.16 | bwd_allreduce_microstep: 393.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-31 11:02:46,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 11:02:46,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.30 | bwd_microstep: 4894.81 | bwd_inner_microstep: 4869.46 | bwd_allreduce_microstep: 25.29 | step_microstep: 181.98 [2024-07-31 11:02:46,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28751.36 | bwd: 41394.64 | bwd_inner: 39647.76 | bwd_allreduce: 1746.38 | step: 182.68 37%|███▋ | 450/1230 [8:50:52<15:10:38, 70.05s/it] {'loss': 1.2071, 'learning_rate': 1.4646334881158706e-05, 'epoch': 0.37} 37%|███▋ | 450/1230 [8:50:52<15:10:38, 70.05s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 11:02:55,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3868.04 | bwd_microstep: 5393.36 | bwd_inner_microstep: 5374.20 | bwd_allreduce_microstep: 19.09 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2814 [2024-07-31 11:03:04,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.80 | bwd_microstep: 5337.88 | bwd_inner_microstep: 4921.47 | bwd_allreduce_microstep: 416.34 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3829 [2024-07-31 11:03:13,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.79 | bwd_microstep: 5132.51 | bwd_inner_microstep: 5069.99 | bwd_allreduce_microstep: 62.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-07-31 11:03:21,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3243.14 | bwd_microstep: 4822.16 | bwd_inner_microstep: 4802.73 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3721 [2024-07-31 11:03:29,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3143.73 | bwd_microstep: 4801.81 | bwd_inner_microstep: 4777.77 | bwd_allreduce_microstep: 23.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 11:03:37,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3229.17 | bwd_microstep: 4845.30 | bwd_inner_microstep: 4801.41 | bwd_allreduce_microstep: 43.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 11:03:46,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.23 | bwd_microstep: 4991.03 | bwd_inner_microstep: 4971.71 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-31 11:03:55,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 11:03:55,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.40 | bwd_microstep: 5147.83 | bwd_inner_microstep: 4747.93 | bwd_allreduce_microstep: 399.83 | step_microstep: 181.18 [2024-07-31 11:03:55,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27978.20 | bwd: 40471.87 | bwd_inner: 39467.15 | bwd_allreduce: 1004.23 | step: 181.77 37%|███▋ | 451/1230 [8:52:00<15:04:32, 69.67s/it] {'loss': 1.2234, 'learning_rate': 1.4623000357062184e-05, 'epoch': 0.37} 37%|███▋ | 451/1230 [8:52:00<15:04:32, 69.67s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3912 [2024-07-31 11:04:04,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.69 | bwd_microstep: 5320.61 | bwd_inner_microstep: 5267.63 | bwd_allreduce_microstep: 52.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3993 [2024-07-31 11:04:12,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.37 | bwd_microstep: 5269.58 | bwd_inner_microstep: 5232.25 | bwd_allreduce_microstep: 37.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-07-31 11:04:21,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.16 | bwd_microstep: 5262.19 | bwd_inner_microstep: 4851.01 | bwd_allreduce_microstep: 411.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3780 [2024-07-31 11:04:30,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.99 | bwd_microstep: 5236.16 | bwd_inner_microstep: 5195.10 | bwd_allreduce_microstep: 41.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-31 11:04:39,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.39 | bwd_microstep: 5003.63 | bwd_inner_microstep: 4984.25 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 11:04:48,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.84 | bwd_microstep: 5198.89 | bwd_inner_microstep: 5116.61 | bwd_allreduce_microstep: 82.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 11:04:57,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.72 | bwd_microstep: 5137.25 | bwd_inner_microstep: 5067.01 | bwd_allreduce_microstep: 70.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 11:05:06,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 11:05:06,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.03 | bwd_microstep: 5059.73 | bwd_inner_microstep: 4997.40 | bwd_allreduce_microstep: 62.25 | step_microstep: 181.51 [2024-07-31 11:05:06,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29237.10 | bwd: 41488.01 | bwd_inner: 40711.21 | bwd_allreduce: 776.31 | step: 182.09 37%|███▋ | 452/1230 [8:53:11<15:08:46, 70.09s/it] {'loss': 1.16, 'learning_rate': 1.459963377451468e-05, 'epoch': 0.37} 37%|███▋ | 452/1230 [8:53:11<15:08:46, 70.09s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3993 [2024-07-31 11:05:15,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3855.28 | bwd_microstep: 5280.08 | bwd_inner_microstep: 5260.86 | bwd_allreduce_microstep: 19.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3828 [2024-07-31 11:05:23,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.22 | bwd_microstep: 4955.32 | bwd_inner_microstep: 4931.13 | bwd_allreduce_microstep: 24.12 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2250 [2024-07-31 11:05:32,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.15 | bwd_microstep: 5193.27 | bwd_inner_microstep: 4788.39 | bwd_allreduce_microstep: 404.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 11:05:41,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.76 | bwd_microstep: 4979.34 | bwd_inner_microstep: 4960.02 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-31 11:05:50,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.78 | bwd_microstep: 4993.75 | bwd_inner_microstep: 4974.39 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2126 [2024-07-31 11:05:58,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.07 | bwd_microstep: 5227.43 | bwd_inner_microstep: 4820.73 | bwd_allreduce_microstep: 406.63 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 11:06:07,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.12 | bwd_microstep: 5007.48 | bwd_inner_microstep: 4955.60 | bwd_allreduce_microstep: 51.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 11:06:15,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 11:06:15,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3174.42 | bwd_microstep: 4706.08 | bwd_inner_microstep: 4682.38 | bwd_allreduce_microstep: 23.63 | step_microstep: 183.00 [2024-07-31 11:06:15,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28744.70 | bwd: 40342.74 | bwd_inner: 39373.44 | bwd_allreduce: 968.79 | step: 183.59 37%|███▋ | 453/1230 [8:54:21<15:05:03, 69.89s/it] {'loss': 1.1922, 'learning_rate': 1.457623529555305e-05, 'epoch': 0.37} 37%|███▋ | 453/1230 [8:54:21<15:05:03, 69.89s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3903 [2024-07-31 11:06:24,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3816.40 | bwd_microstep: 5186.39 | bwd_inner_microstep: 5167.33 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 11:06:33,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.75 | bwd_microstep: 5271.72 | bwd_inner_microstep: 5173.54 | bwd_allreduce_microstep: 98.10 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2250 [2024-07-31 11:06:42,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.20 | bwd_microstep: 5178.20 | bwd_inner_microstep: 4776.75 | bwd_allreduce_microstep: 401.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-31 11:06:50,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.77 | bwd_microstep: 5006.49 | bwd_inner_microstep: 4985.72 | bwd_allreduce_microstep: 20.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3821 [2024-07-31 11:06:59,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.47 | bwd_microstep: 5151.48 | bwd_inner_microstep: 5105.84 | bwd_allreduce_microstep: 45.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 11:07:08,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.55 | bwd_microstep: 5031.85 | bwd_inner_microstep: 5004.17 | bwd_allreduce_microstep: 27.60 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3677 [2024-07-31 11:07:17,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.76 | bwd_microstep: 5077.80 | bwd_inner_microstep: 5003.30 | bwd_allreduce_microstep: 74.43 | step_microstep: 0.11 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-31 11:07:26,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 11:07:26,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.86 | bwd_microstep: 5013.86 | bwd_inner_microstep: 4988.77 | bwd_allreduce_microstep: 25.03 | step_microstep: 182.80 [2024-07-31 11:07:26,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29385.68 | bwd: 40917.76 | bwd_inner: 40205.35 | bwd_allreduce: 711.90 | step: 183.53 37%|███▋ | 454/1230 [8:55:32<15:06:48, 70.11s/it] {'loss': 1.1721, 'learning_rate': 1.4552805082435333e-05, 'epoch': 0.37} 37%|███▋ | 454/1230 [8:55:32<15:06:48, 70.11s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3807 [2024-07-31 11:07:35,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.76 | bwd_microstep: 5502.19 | bwd_inner_microstep: 5404.35 | bwd_allreduce_microstep: 97.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2291 [2024-07-31 11:07:44,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.07 | bwd_microstep: 5266.28 | bwd_inner_microstep: 4857.43 | bwd_allreduce_microstep: 408.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3806 [2024-07-31 11:07:53,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.08 | bwd_microstep: 5035.76 | bwd_inner_microstep: 5016.29 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3821 [2024-07-31 11:08:01,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.09 | bwd_microstep: 5111.30 | bwd_inner_microstep: 5071.27 | bwd_allreduce_microstep: 39.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-31 11:08:10,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.19 | bwd_microstep: 5190.60 | bwd_inner_microstep: 5132.31 | bwd_allreduce_microstep: 58.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 11:08:19,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.20 | bwd_microstep: 5051.30 | bwd_inner_microstep: 4990.73 | bwd_allreduce_microstep: 60.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 11:08:27,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.74 | bwd_microstep: 4897.89 | bwd_inner_microstep: 4878.47 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-31 11:08:36,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 11:08:36,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.13 | bwd_microstep: 4936.76 | bwd_inner_microstep: 4910.65 | bwd_allreduce_microstep: 26.04 | step_microstep: 182.48 [2024-07-31 11:08:36,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29215.15 | bwd: 40992.07 | bwd_inner: 40261.43 | bwd_allreduce: 730.14 | step: 183.06 37%|███▋ | 455/1230 [8:56:42<15:07:18, 70.24s/it] {'loss': 1.2375, 'learning_rate': 1.4529343297639638e-05, 'epoch': 0.37} 37%|███▋ | 455/1230 [8:56:42<15:07:18, 70.24s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3974 [2024-07-31 11:08:45,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3861.16 | bwd_microstep: 5306.23 | bwd_inner_microstep: 5287.16 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3789 [2024-07-31 11:08:55,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.64 | bwd_microstep: 5540.77 | bwd_inner_microstep: 5441.08 | bwd_allreduce_microstep: 99.62 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 11:09:03,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.25 | bwd_microstep: 5218.00 | bwd_inner_microstep: 4813.21 | bwd_allreduce_microstep: 404.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3891 [2024-07-31 11:09:12,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3265.46 | bwd_microstep: 4923.33 | bwd_inner_microstep: 4903.96 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-31 11:09:20,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.14 | bwd_microstep: 5153.81 | bwd_inner_microstep: 5078.86 | bwd_allreduce_microstep: 74.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3818 [2024-07-31 11:09:29,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.47 | bwd_microstep: 5046.52 | bwd_inner_microstep: 5027.25 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3645 [2024-07-31 11:09:38,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.68 | bwd_microstep: 4968.88 | bwd_inner_microstep: 4902.44 | bwd_allreduce_microstep: 66.38 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 11:09:46,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 11:09:46,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.59 | bwd_microstep: 5035.34 | bwd_inner_microstep: 4978.32 | bwd_allreduce_microstep: 56.96 | step_microstep: 181.07 [2024-07-31 11:09:46,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28795.27 | bwd: 41192.87 | bwd_inner: 40432.23 | bwd_allreduce: 760.16 | step: 181.66 37%|███▋ | 456/1230 [8:57:52<15:06:26, 70.27s/it] {'loss': 1.1577, 'learning_rate': 1.4505850103863003e-05, 'epoch': 0.37} 37%|███▋ | 456/1230 [8:57:52<15:06:26, 70.27s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3996 [2024-07-31 11:09:56,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3843.14 | bwd_microstep: 5274.33 | bwd_inner_microstep: 5255.19 | bwd_allreduce_microstep: 19.07 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2356 [2024-07-31 11:10:04,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.06 | bwd_microstep: 5341.01 | bwd_inner_microstep: 4933.61 | bwd_allreduce_microstep: 407.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-31 11:10:13,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.78 | bwd_microstep: 4990.06 | bwd_inner_microstep: 4970.70 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-31 11:10:22,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.28 | bwd_microstep: 5206.16 | bwd_inner_microstep: 5122.14 | bwd_allreduce_microstep: 83.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 11:10:30,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.98 | bwd_microstep: 4804.50 | bwd_inner_microstep: 4766.12 | bwd_allreduce_microstep: 38.31 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 11:10:39,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.87 | bwd_microstep: 4966.52 | bwd_inner_microstep: 4933.39 | bwd_allreduce_microstep: 33.06 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2131 [2024-07-31 11:10:47,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.82 | bwd_microstep: 5241.34 | bwd_inner_microstep: 4835.18 | bwd_allreduce_microstep: 406.10 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 11:10:56,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 11:10:56,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.28 | bwd_microstep: 4928.19 | bwd_inner_microstep: 4901.00 | bwd_allreduce_microstep: 27.12 | step_microstep: 181.88 [2024-07-31 11:10:56,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28739.11 | bwd: 40752.10 | bwd_inner: 39717.26 | bwd_allreduce: 1034.33 | step: 182.57 37%|███▋ | 457/1230 [8:59:02<15:03:33, 70.13s/it] {'loss': 1.2174, 'learning_rate': 1.448232566402028e-05, 'epoch': 0.37} 37%|███▋ | 457/1230 [8:59:02<15:03:33, 70.13s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3865 [2024-07-31 11:11:05,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3795.69 | bwd_microstep: 5119.85 | bwd_inner_microstep: 5100.75 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3840 [2024-07-31 11:11:14,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.10 | bwd_microstep: 5254.36 | bwd_inner_microstep: 5198.01 | bwd_allreduce_microstep: 56.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3846 [2024-07-31 11:11:23,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3805.06 | bwd_microstep: 5104.29 | bwd_inner_microstep: 5084.92 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2274 [2024-07-31 11:11:31,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3066.89 | bwd_microstep: 5035.60 | bwd_inner_microstep: 4647.04 | bwd_allreduce_microstep: 388.49 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 11:11:40,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.56 | bwd_microstep: 5199.81 | bwd_inner_microstep: 5116.01 | bwd_allreduce_microstep: 83.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3831 [2024-07-31 11:11:49,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.82 | bwd_microstep: 5059.54 | bwd_inner_microstep: 5040.21 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 11:11:57,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3350.19 | bwd_microstep: 4858.56 | bwd_inner_microstep: 4831.95 | bwd_allreduce_microstep: 26.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 11:12:06,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 11:12:06,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.21 | bwd_microstep: 5109.72 | bwd_inner_microstep: 5045.30 | bwd_allreduce_microstep: 64.35 | step_microstep: 182.77 [2024-07-31 11:12:06,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28639.42 | bwd: 40741.70 | bwd_inner: 40064.12 | bwd_allreduce: 677.08 | step: 183.37 37%|███▋ | 458/1230 [9:00:12<15:00:46, 70.01s/it] {'loss': 1.1911, 'learning_rate': 1.4458770141242992e-05, 'epoch': 0.37} 37%|███▋ | 458/1230 [9:00:12<15:00:46, 70.01s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 3385 [2024-07-31 11:12:15,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.05 | bwd_microstep: 5639.31 | bwd_inner_microstep: 5212.75 | bwd_allreduce_microstep: 426.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3839 [2024-07-31 11:12:24,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.64 | bwd_microstep: 5372.40 | bwd_inner_microstep: 5299.07 | bwd_allreduce_microstep: 73.26 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3810 [2024-07-31 11:12:33,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3658.93 | bwd_microstep: 5301.04 | bwd_inner_microstep: 5205.82 | bwd_allreduce_microstep: 95.15 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1196 [2024-07-31 11:12:42,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.33 | bwd_microstep: 5207.81 | bwd_inner_microstep: 4806.61 | bwd_allreduce_microstep: 401.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 11:12:51,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.52 | bwd_microstep: 4980.18 | bwd_inner_microstep: 4959.29 | bwd_allreduce_microstep: 20.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 11:13:00,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.84 | bwd_microstep: 5050.43 | bwd_inner_microstep: 5005.64 | bwd_allreduce_microstep: 44.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-31 11:13:08,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.22 | bwd_microstep: 4889.90 | bwd_inner_microstep: 4870.45 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3710 [2024-07-31 11:13:17,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 11:13:17,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.76 | bwd_microstep: 5034.14 | bwd_inner_microstep: 4990.62 | bwd_allreduce_microstep: 43.45 | step_microstep: 182.94 [2024-07-31 11:13:17,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29338.20 | bwd: 41475.18 | bwd_inner: 40350.20 | bwd_allreduce: 1124.50 | step: 183.53 37%|███▋ | 459/1230 [9:01:23<15:03:59, 70.35s/it] {'loss': 1.1826, 'learning_rate': 1.443518369887821e-05, 'epoch': 0.37} 37%|███▋ | 459/1230 [9:01:23<15:03:59, 70.35s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 11:13:26,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3866.32 | bwd_microstep: 5382.97 | bwd_inner_microstep: 5363.94 | bwd_allreduce_microstep: 18.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3588 [2024-07-31 11:13:35,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.54 | bwd_microstep: 5295.37 | bwd_inner_microstep: 5159.89 | bwd_allreduce_microstep: 135.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3861 [2024-07-31 11:13:44,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.94 | bwd_microstep: 5217.96 | bwd_inner_microstep: 5164.24 | bwd_allreduce_microstep: 53.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3769 [2024-07-31 11:13:53,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.56 | bwd_microstep: 5165.67 | bwd_inner_microstep: 5110.92 | bwd_allreduce_microstep: 54.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 11:14:02,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.18 | bwd_microstep: 5184.65 | bwd_inner_microstep: 4781.37 | bwd_allreduce_microstep: 403.21 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 11:14:11,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.29 | bwd_microstep: 4907.69 | bwd_inner_microstep: 4882.14 | bwd_allreduce_microstep: 25.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 11:14:19,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.70 | bwd_microstep: 5057.02 | bwd_inner_microstep: 4998.61 | bwd_allreduce_microstep: 58.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 11:14:28,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 11:14:28,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.62 | bwd_microstep: 5097.40 | bwd_inner_microstep: 5050.75 | bwd_allreduce_microstep: 46.58 | step_microstep: 409.03 [2024-07-31 11:14:28,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29368.06 | bwd: 41308.73 | bwd_inner: 40511.79 | bwd_allreduce: 796.44 | step: 409.64 37%|███▋ | 460/1230 [9:02:34<15:06:15, 70.62s/it] {'loss': 1.1723, 'learning_rate': 1.4411566500487425e-05, 'epoch': 0.37} 37%|███▋ | 460/1230 [9:02:34<15:06:15, 70.62s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3993 [2024-07-31 11:14:38,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.28 | bwd_microstep: 5450.40 | bwd_inner_microstep: 5393.96 | bwd_allreduce_microstep: 56.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3572 [2024-07-31 11:14:46,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.20 | bwd_microstep: 5189.65 | bwd_inner_microstep: 5099.60 | bwd_allreduce_microstep: 89.96 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3812 [2024-07-31 11:14:55,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3246.62 | bwd_microstep: 4861.20 | bwd_inner_microstep: 4838.63 | bwd_allreduce_microstep: 22.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3762 [2024-07-31 11:15:03,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.67 | bwd_microstep: 5072.85 | bwd_inner_microstep: 5045.81 | bwd_allreduce_microstep: 26.96 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 11:15:12,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.33 | bwd_microstep: 5188.88 | bwd_inner_microstep: 5128.74 | bwd_allreduce_microstep: 60.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 11:15:21,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.56 | bwd_microstep: 5103.08 | bwd_inner_microstep: 5031.95 | bwd_allreduce_microstep: 71.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 11:15:30,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.06 | bwd_microstep: 5131.38 | bwd_inner_microstep: 5062.85 | bwd_allreduce_microstep: 68.46 | step_microstep: 0.19 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2924 [2024-07-31 11:15:38,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 11:15:38,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.72 | bwd_microstep: 5077.12 | bwd_inner_microstep: 4682.55 | bwd_allreduce_microstep: 394.50 | step_microstep: 182.05 [2024-07-31 11:15:38,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28643.34 | bwd: 41074.51 | bwd_inner: 40284.00 | bwd_allreduce: 789.97 | step: 182.75 37%|███▋ | 461/1230 [9:03:44<15:02:54, 70.45s/it] {'loss': 1.1527, 'learning_rate': 1.4387918709845395e-05, 'epoch': 0.37} 37%|███▋ | 461/1230 [9:03:44<15:02:54, 70.45s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2345 [2024-07-31 11:15:47,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.37 | bwd_microstep: 5301.93 | bwd_inner_microstep: 4898.82 | bwd_allreduce_microstep: 403.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-31 11:15:56,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.02 | bwd_microstep: 5129.63 | bwd_inner_microstep: 5078.88 | bwd_allreduce_microstep: 50.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 11:16:05,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.93 | bwd_microstep: 5184.16 | bwd_inner_microstep: 5103.36 | bwd_allreduce_microstep: 80.74 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3795 [2024-07-31 11:16:14,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.78 | bwd_microstep: 5234.50 | bwd_inner_microstep: 5158.83 | bwd_allreduce_microstep: 75.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-31 11:16:23,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.77 | bwd_microstep: 5384.33 | bwd_inner_microstep: 4966.55 | bwd_allreduce_microstep: 417.71 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2089 [2024-07-31 11:16:31,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3005.04 | bwd_microstep: 4895.71 | bwd_inner_microstep: 4518.71 | bwd_allreduce_microstep: 376.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 11:16:39,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.89 | bwd_microstep: 4970.94 | bwd_inner_microstep: 4936.17 | bwd_allreduce_microstep: 34.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-31 11:16:48,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 11:16:48,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.45 | bwd_microstep: 5137.28 | bwd_inner_microstep: 4739.09 | bwd_allreduce_microstep: 398.12 | step_microstep: 181.79 [2024-07-31 11:16:48,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28226.15 | bwd: 41238.46 | bwd_inner: 39400.34 | bwd_allreduce: 1837.63 | step: 182.38 38%|███▊ | 462/1230 [9:04:54<14:59:13, 70.25s/it] {'loss': 1.2365, 'learning_rate': 1.4364240490939032e-05, 'epoch': 0.38} 38%|███▊ | 462/1230 [9:04:54<14:59:13, 70.25s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3569 [2024-07-31 11:16:57,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.74 | bwd_microstep: 5278.62 | bwd_inner_microstep: 5151.18 | bwd_allreduce_microstep: 127.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3569 [2024-07-31 11:17:06,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3369.88 | bwd_microstep: 5168.61 | bwd_inner_microstep: 5091.23 | bwd_allreduce_microstep: 77.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 11:17:14,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.30 | bwd_microstep: 5095.91 | bwd_inner_microstep: 5029.15 | bwd_allreduce_microstep: 66.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3772 [2024-07-31 11:17:23,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.65 | bwd_microstep: 5081.20 | bwd_inner_microstep: 5061.82 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 11:17:32,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.75 | bwd_microstep: 5037.41 | bwd_inner_microstep: 5010.24 | bwd_allreduce_microstep: 27.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-31 11:17:41,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.20 | bwd_microstep: 5180.25 | bwd_inner_microstep: 4777.94 | bwd_allreduce_microstep: 402.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 11:17:50,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.37 | bwd_microstep: 4985.57 | bwd_inner_microstep: 4966.18 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 11:17:58,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 11:17:58,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.75 | bwd_microstep: 4978.32 | bwd_inner_microstep: 4923.76 | bwd_allreduce_microstep: 54.49 | step_microstep: 182.66 [2024-07-31 11:17:58,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28891.54 | bwd: 40805.87 | bwd_inner: 40011.46 | bwd_allreduce: 793.92 | step: 183.24 38%|███▊ | 463/1230 [9:06:04<14:57:12, 70.19s/it] {'loss': 1.2061, 'learning_rate': 1.434053200796625e-05, 'epoch': 0.38} 38%|███▊ | 463/1230 [9:06:04<14:57:12, 70.19s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3914 [2024-07-31 11:18:07,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.07 | bwd_microstep: 5384.25 | bwd_inner_microstep: 5320.29 | bwd_allreduce_microstep: 63.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-31 11:18:16,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.28 | bwd_microstep: 5141.70 | bwd_inner_microstep: 5064.03 | bwd_allreduce_microstep: 77.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-07-31 11:18:25,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.51 | bwd_microstep: 5139.62 | bwd_inner_microstep: 5104.96 | bwd_allreduce_microstep: 34.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-07-31 11:18:34,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.30 | bwd_microstep: 5193.25 | bwd_inner_microstep: 5137.27 | bwd_allreduce_microstep: 55.92 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 11:18:43,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.01 | bwd_microstep: 5172.67 | bwd_inner_microstep: 5093.68 | bwd_allreduce_microstep: 78.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 11:18:52,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.67 | bwd_microstep: 5072.96 | bwd_inner_microstep: 5028.95 | bwd_allreduce_microstep: 43.91 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-07-31 11:19:00,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.58 | bwd_microstep: 5079.77 | bwd_inner_microstep: 4686.60 | bwd_allreduce_microstep: 393.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2142 [2024-07-31 11:19:09,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 11:19:09,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.19 | bwd_microstep: 5102.15 | bwd_inner_microstep: 4707.45 | bwd_allreduce_microstep: 394.63 | step_microstep: 181.58 [2024-07-31 11:19:09,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29094.50 | bwd: 41286.34 | bwd_inner: 40143.16 | bwd_allreduce: 1142.68 | step: 182.17 38%|███▊ | 464/1230 [9:07:15<14:58:04, 70.35s/it] {'loss': 1.2218, 'learning_rate': 1.4316793425334834e-05, 'epoch': 0.38} 38%|███▊ | 464/1230 [9:07:15<14:58:04, 70.35s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 4038 [2024-07-31 11:19:18,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.59 | bwd_microstep: 5199.60 | bwd_inner_microstep: 5167.99 | bwd_allreduce_microstep: 31.55 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3540 [2024-07-31 11:19:27,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.48 | bwd_microstep: 5088.04 | bwd_inner_microstep: 5006.80 | bwd_allreduce_microstep: 81.17 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2855 [2024-07-31 11:19:35,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3280.73 | bwd_microstep: 5229.13 | bwd_inner_microstep: 4824.81 | bwd_allreduce_microstep: 404.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-07-31 11:19:44,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.20 | bwd_microstep: 5188.66 | bwd_inner_microstep: 5131.71 | bwd_allreduce_microstep: 56.88 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2088 [2024-07-31 11:19:53,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.17 | bwd_microstep: 5235.00 | bwd_inner_microstep: 4828.72 | bwd_allreduce_microstep: 406.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3790 [2024-07-31 11:20:02,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.38 | bwd_microstep: 5026.55 | bwd_inner_microstep: 5007.24 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-31 11:20:10,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3824.59 | bwd_microstep: 5124.61 | bwd_inner_microstep: 4724.25 | bwd_allreduce_microstep: 400.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-31 11:20:19,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-31 11:20:19,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.75 | bwd_microstep: 4857.92 | bwd_inner_microstep: 4838.56 | bwd_allreduce_microstep: 19.29 | step_microstep: 182.49 [2024-07-31 11:20:19,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28915.81 | bwd: 40949.48 | bwd_inner: 39530.02 | bwd_allreduce: 1418.98 | step: 183.09 38%|███▊ | 465/1230 [9:08:25<14:56:19, 70.30s/it] {'loss': 1.1734, 'learning_rate': 1.4293024907661295e-05, 'epoch': 0.38} 38%|███▊ | 465/1230 [9:08:25<14:56:19, 70.30s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3936 [2024-07-31 11:20:29,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3845.33 | bwd_microstep: 5421.74 | bwd_inner_microstep: 5366.48 | bwd_allreduce_microstep: 55.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2381 [2024-07-31 11:20:37,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3332.17 | bwd_microstep: 5189.65 | bwd_inner_microstep: 4788.92 | bwd_allreduce_microstep: 400.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3825 [2024-07-31 11:20:46,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.53 | bwd_microstep: 5126.38 | bwd_inner_microstep: 5083.08 | bwd_allreduce_microstep: 43.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2226 [2024-07-31 11:20:55,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.42 | bwd_microstep: 5197.27 | bwd_inner_microstep: 4793.63 | bwd_allreduce_microstep: 403.58 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3611 [2024-07-31 11:21:03,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.35 | bwd_microstep: 5086.70 | bwd_inner_microstep: 5007.98 | bwd_allreduce_microstep: 78.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 11:21:12,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.55 | bwd_microstep: 5175.46 | bwd_inner_microstep: 5118.76 | bwd_allreduce_microstep: 56.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-31 11:21:20,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.80 | bwd_microstep: 4976.48 | bwd_inner_microstep: 4924.30 | bwd_allreduce_microstep: 52.11 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3669 [2024-07-31 11:21:29,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 11:21:29,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.74 | bwd_microstep: 4924.11 | bwd_inner_microstep: 4896.15 | bwd_allreduce_microstep: 27.90 | step_microstep: 182.26 [2024-07-31 11:21:29,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28672.79 | bwd: 41097.78 | bwd_inner: 39979.24 | bwd_allreduce: 1118.05 | step: 182.96 38%|███▊ | 466/1230 [9:09:35<14:54:24, 70.24s/it] {'loss': 1.212, 'learning_rate': 1.4269226619769725e-05, 'epoch': 0.38} 38%|███▊ | 466/1230 [9:09:35<14:54:24, 70.24s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2354 [2024-07-31 11:21:38,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.66 | bwd_microstep: 5400.01 | bwd_inner_microstep: 4986.17 | bwd_allreduce_microstep: 413.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 11:21:48,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.40 | bwd_microstep: 5388.02 | bwd_inner_microstep: 5303.90 | bwd_allreduce_microstep: 84.05 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3096 [2024-07-31 11:21:56,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.08 | bwd_microstep: 5238.27 | bwd_inner_microstep: 4948.74 | bwd_allreduce_microstep: 289.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2242 [2024-07-31 11:22:05,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.14 | bwd_microstep: 5311.32 | bwd_inner_microstep: 4899.89 | bwd_allreduce_microstep: 411.36 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 11:22:14,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.64 | bwd_microstep: 5048.28 | bwd_inner_microstep: 4655.87 | bwd_allreduce_microstep: 392.35 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3708 [2024-07-31 11:22:23,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.33 | bwd_microstep: 5116.21 | bwd_inner_microstep: 5029.26 | bwd_allreduce_microstep: 86.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3666 [2024-07-31 11:22:31,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.66 | bwd_microstep: 4961.27 | bwd_inner_microstep: 4941.90 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2932 [2024-07-31 11:22:40,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 11:22:40,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.30 | bwd_microstep: 5068.99 | bwd_inner_microstep: 4682.53 | bwd_allreduce_microstep: 386.39 | step_microstep: 182.08 [2024-07-31 11:22:40,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28915.13 | bwd: 41532.36 | bwd_inner: 39448.19 | bwd_allreduce: 2083.68 | step: 182.68 38%|███▊ | 467/1230 [9:10:46<14:55:16, 70.40s/it] {'loss': 1.1607, 'learning_rate': 1.424539872669067e-05, 'epoch': 0.38} 38%|███▊ | 467/1230 [9:10:46<14:55:16, 70.40s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3966 [2024-07-31 11:22:49,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3464.68 | bwd_microstep: 5403.45 | bwd_inner_microstep: 5346.79 | bwd_allreduce_microstep: 56.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3899 [2024-07-31 11:22:58,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3798.73 | bwd_microstep: 5128.04 | bwd_inner_microstep: 5108.66 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2058 [2024-07-31 11:23:07,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.45 | bwd_microstep: 5169.84 | bwd_inner_microstep: 4767.61 | bwd_allreduce_microstep: 402.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 11:23:15,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.34 | bwd_microstep: 5213.40 | bwd_inner_microstep: 5125.65 | bwd_allreduce_microstep: 87.67 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 11:23:24,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.19 | bwd_microstep: 4979.07 | bwd_inner_microstep: 4959.66 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 11:23:33,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.48 | bwd_microstep: 4976.82 | bwd_inner_microstep: 4957.50 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-31 11:23:42,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.85 | bwd_microstep: 5093.73 | bwd_inner_microstep: 5049.41 | bwd_allreduce_microstep: 44.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2127 [2024-07-31 11:23:51,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 11:23:51,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.69 | bwd_microstep: 5386.57 | bwd_inner_microstep: 4851.70 | bwd_allreduce_microstep: 534.80 | step_microstep: 181.28 [2024-07-31 11:23:51,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28883.32 | bwd: 41350.90 | bwd_inner: 40166.94 | bwd_allreduce: 1183.47 | step: 181.85 38%|███▊ | 468/1230 [9:11:57<14:54:43, 70.45s/it] {'loss': 1.1388, 'learning_rate': 1.4221541393659966e-05, 'epoch': 0.38} 38%|███▊ | 468/1230 [9:11:57<14:54:43, 70.45s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 11:24:00,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3858.87 | bwd_microstep: 5723.02 | bwd_inner_microstep: 5703.93 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3832 [2024-07-31 11:24:09,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.83 | bwd_microstep: 5105.69 | bwd_inner_microstep: 5066.72 | bwd_allreduce_microstep: 38.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-31 11:24:18,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.32 | bwd_microstep: 5165.74 | bwd_inner_microstep: 5082.23 | bwd_allreduce_microstep: 83.45 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3873 [2024-07-31 11:24:26,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3163.87 | bwd_microstep: 4830.25 | bwd_inner_microstep: 4810.96 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2106 [2024-07-31 11:24:34,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.05 | bwd_microstep: 5141.25 | bwd_inner_microstep: 4742.16 | bwd_allreduce_microstep: 399.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 11:24:43,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.24 | bwd_microstep: 5207.92 | bwd_inner_microstep: 5128.42 | bwd_allreduce_microstep: 79.43 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2119 [2024-07-31 11:24:52,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.59 | bwd_microstep: 5138.30 | bwd_inner_microstep: 4740.91 | bwd_allreduce_microstep: 397.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 11:25:01,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 11:25:01,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.61 | bwd_microstep: 5108.97 | bwd_inner_microstep: 4712.53 | bwd_allreduce_microstep: 396.38 | step_microstep: 181.74 [2024-07-31 11:25:01,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28357.29 | bwd: 41421.13 | bwd_inner: 39987.81 | bwd_allreduce: 1432.83 | step: 182.33 38%|███▊ | 469/1230 [9:13:07<14:52:14, 70.35s/it] {'loss': 1.203, 'learning_rate': 1.4197654786117604e-05, 'epoch': 0.38} 38%|███▊ | 469/1230 [9:13:07<14:52:14, 70.35s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2827 [2024-07-31 11:25:10,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.60 | bwd_microstep: 5478.86 | bwd_inner_microstep: 5056.00 | bwd_allreduce_microstep: 422.79 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3848 [2024-07-31 11:25:19,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.58 | bwd_microstep: 5102.53 | bwd_inner_microstep: 5059.79 | bwd_allreduce_microstep: 42.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2263 [2024-07-31 11:25:27,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.66 | bwd_microstep: 5165.52 | bwd_inner_microstep: 4763.52 | bwd_allreduce_microstep: 401.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2225 [2024-07-31 11:25:36,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.35 | bwd_microstep: 5213.66 | bwd_inner_microstep: 4807.57 | bwd_allreduce_microstep: 406.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 11:25:45,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.22 | bwd_microstep: 5212.01 | bwd_inner_microstep: 5149.07 | bwd_allreduce_microstep: 62.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 11:25:54,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.76 | bwd_microstep: 5161.77 | bwd_inner_microstep: 5084.99 | bwd_allreduce_microstep: 76.71 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3693 [2024-07-31 11:26:02,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.00 | bwd_microstep: 5041.53 | bwd_inner_microstep: 4967.88 | bwd_allreduce_microstep: 73.58 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2146 [2024-07-31 11:26:12,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 11:26:12,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.34 | bwd_microstep: 5270.48 | bwd_inner_microstep: 4860.54 | bwd_allreduce_microstep: 409.88 | step_microstep: 181.75 [2024-07-31 11:26:12,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28772.41 | bwd: 41646.34 | bwd_inner: 39749.30 | bwd_allreduce: 1896.55 | step: 182.34 38%|███▊ | 470/1230 [9:14:17<14:52:35, 70.47s/it] {'loss': 1.1801, 'learning_rate': 1.4173739069706584e-05, 'epoch': 0.38} 38%|███▊ | 470/1230 [9:14:17<14:52:35, 70.47s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3912 [2024-07-31 11:26:21,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.65 | bwd_microstep: 5411.93 | bwd_inner_microstep: 5345.81 | bwd_allreduce_microstep: 66.05 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2033 [2024-07-31 11:26:29,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.34 | bwd_microstep: 5194.93 | bwd_inner_microstep: 4790.27 | bwd_allreduce_microstep: 404.59 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3720 [2024-07-31 11:26:38,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.45 | bwd_microstep: 5123.24 | bwd_inner_microstep: 5084.53 | bwd_allreduce_microstep: 38.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 11:26:47,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.74 | bwd_microstep: 5190.56 | bwd_inner_microstep: 4786.75 | bwd_allreduce_microstep: 403.73 | step_microstep: 0.19 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3135 [2024-07-31 11:26:56,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.63 | bwd_microstep: 5197.95 | bwd_inner_microstep: 4920.08 | bwd_allreduce_microstep: 277.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 11:27:04,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.68 | bwd_microstep: 5040.67 | bwd_inner_microstep: 4983.78 | bwd_allreduce_microstep: 56.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3674 [2024-07-31 11:27:13,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.89 | bwd_microstep: 4900.57 | bwd_inner_microstep: 4874.78 | bwd_allreduce_microstep: 25.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 11:27:22,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 11:27:22,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.61 | bwd_microstep: 5038.96 | bwd_inner_microstep: 4977.72 | bwd_allreduce_microstep: 61.17 | step_microstep: 181.52 [2024-07-31 11:27:22,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28828.89 | bwd: 41098.78 | bwd_inner: 39763.67 | bwd_allreduce: 1334.64 | step: 182.22 38%|███▊ | 471/1230 [9:15:28<14:50:37, 70.41s/it] {'loss': 1.1927, 'learning_rate': 1.414979441027176e-05, 'epoch': 0.38} 38%|███▊ | 471/1230 [9:15:28<14:50:37, 70.41s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3938 [2024-07-31 11:27:31,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3814.37 | bwd_microstep: 5225.01 | bwd_inner_microstep: 5205.94 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3872 [2024-07-31 11:27:39,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3443.38 | bwd_microstep: 5114.32 | bwd_inner_microstep: 5071.21 | bwd_allreduce_microstep: 43.03 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2066 [2024-07-31 11:27:48,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.63 | bwd_microstep: 5182.92 | bwd_inner_microstep: 4779.77 | bwd_allreduce_microstep: 403.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-31 11:27:57,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.53 | bwd_microstep: 5001.25 | bwd_inner_microstep: 4981.85 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-31 11:28:06,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.44 | bwd_microstep: 5156.09 | bwd_inner_microstep: 4756.51 | bwd_allreduce_microstep: 399.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-31 11:28:14,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.33 | bwd_microstep: 5051.04 | bwd_inner_microstep: 5007.84 | bwd_allreduce_microstep: 43.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-31 11:28:23,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.94 | bwd_microstep: 4891.00 | bwd_inner_microstep: 4870.62 | bwd_allreduce_microstep: 20.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 11:28:32,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 11:28:32,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.77 | bwd_microstep: 4886.45 | bwd_inner_microstep: 4866.96 | bwd_allreduce_microstep: 19.42 | step_microstep: 182.03 [2024-07-31 11:28:32,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29109.30 | bwd: 40508.07 | bwd_inner: 39540.66 | bwd_allreduce: 966.90 | step: 182.62 38%|███▊ | 472/1230 [9:16:38<14:47:45, 70.27s/it] {'loss': 1.1707, 'learning_rate': 1.4125820973858693e-05, 'epoch': 0.38} 38%|███▊ | 472/1230 [9:16:38<14:47:45, 70.27s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3965 [2024-07-31 11:28:41,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.75 | bwd_microstep: 5212.89 | bwd_inner_microstep: 5179.46 | bwd_allreduce_microstep: 33.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 11:28:50,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.50 | bwd_microstep: 5160.31 | bwd_inner_microstep: 4755.02 | bwd_allreduce_microstep: 405.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 11:28:58,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.80 | bwd_microstep: 5180.50 | bwd_inner_microstep: 5101.99 | bwd_allreduce_microstep: 78.44 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2243 [2024-07-31 11:29:07,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.98 | bwd_microstep: 5258.41 | bwd_inner_microstep: 4848.96 | bwd_allreduce_microstep: 409.38 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2108 [2024-07-31 11:29:16,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.82 | bwd_microstep: 5162.23 | bwd_inner_microstep: 4760.08 | bwd_allreduce_microstep: 402.08 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2914 [2024-07-31 11:29:25,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.92 | bwd_microstep: 5226.06 | bwd_inner_microstep: 4817.82 | bwd_allreduce_microstep: 408.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 11:29:33,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.44 | bwd_microstep: 5121.21 | bwd_inner_microstep: 4722.88 | bwd_allreduce_microstep: 398.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2137 [2024-07-31 11:29:42,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 11:29:42,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.49 | bwd_microstep: 5076.11 | bwd_inner_microstep: 4683.81 | bwd_allreduce_microstep: 392.23 | step_microstep: 182.99 [2024-07-31 11:29:42,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28676.59 | bwd: 41397.70 | bwd_inner: 38869.96 | bwd_allreduce: 2527.25 | step: 183.57 38%|███▊ | 473/1230 [9:17:48<14:47:04, 70.31s/it] {'loss': 1.2063, 'learning_rate': 1.41018189267125e-05, 'epoch': 0.38} 38%|███▊ | 473/1230 [9:17:48<14:47:04, 70.31s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3915 [2024-07-31 11:29:51,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.96 | bwd_microstep: 5353.69 | bwd_inner_microstep: 5292.95 | bwd_allreduce_microstep: 60.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3934 [2024-07-31 11:30:00,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.43 | bwd_microstep: 5164.19 | bwd_inner_microstep: 5144.86 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3751 [2024-07-31 11:30:09,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.62 | bwd_microstep: 5232.12 | bwd_inner_microstep: 5155.59 | bwd_allreduce_microstep: 76.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 11:30:18,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.11 | bwd_microstep: 5120.51 | bwd_inner_microstep: 5058.90 | bwd_allreduce_microstep: 61.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-31 11:30:27,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.43 | bwd_microstep: 5153.36 | bwd_inner_microstep: 5097.53 | bwd_allreduce_microstep: 55.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 11:30:35,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.26 | bwd_microstep: 5232.96 | bwd_inner_microstep: 5144.22 | bwd_allreduce_microstep: 88.67 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 644 [2024-07-31 11:30:44,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.82 | bwd_microstep: 5267.52 | bwd_inner_microstep: 4863.94 | bwd_allreduce_microstep: 403.51 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3689 [2024-07-31 11:30:53,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 11:30:53,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.85 | bwd_microstep: 5079.54 | bwd_inner_microstep: 5002.96 | bwd_allreduce_microstep: 76.51 | step_microstep: 181.79 [2024-07-31 11:30:53,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29008.40 | bwd: 41603.88 | bwd_inner: 40760.89 | bwd_allreduce: 842.50 | step: 182.38 39%|███▊ | 474/1230 [9:18:59<14:48:17, 70.50s/it] {'loss': 1.1656, 'learning_rate': 1.4077788435276701e-05, 'epoch': 0.39} 39%|███▊ | 474/1230 [9:18:59<14:48:17, 70.50s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-31 11:31:02,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.45 | bwd_microstep: 5592.86 | bwd_inner_microstep: 5412.17 | bwd_allreduce_microstep: 180.62 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2802 [2024-07-31 11:31:11,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.56 | bwd_microstep: 5171.69 | bwd_inner_microstep: 4768.37 | bwd_allreduce_microstep: 403.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2256 [2024-07-31 11:31:20,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.36 | bwd_microstep: 5122.13 | bwd_inner_microstep: 4723.77 | bwd_allreduce_microstep: 398.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-31 11:31:29,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.91 | bwd_microstep: 5015.59 | bwd_inner_microstep: 4988.17 | bwd_allreduce_microstep: 27.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 11:31:37,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.61 | bwd_microstep: 4893.63 | bwd_inner_microstep: 4874.15 | bwd_allreduce_microstep: 19.41 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3678 [2024-07-31 11:31:46,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.95 | bwd_microstep: 5057.46 | bwd_inner_microstep: 5012.36 | bwd_allreduce_microstep: 45.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 11:31:54,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3235.37 | bwd_microstep: 4721.83 | bwd_inner_microstep: 4694.79 | bwd_allreduce_microstep: 26.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2124 [2024-07-31 11:32:03,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 11:32:03,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.50 | bwd_microstep: 5091.32 | bwd_inner_microstep: 4694.83 | bwd_allreduce_microstep: 396.43 | step_microstep: 181.54 [2024-07-31 11:32:03,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28674.62 | bwd: 40666.51 | bwd_inner: 39168.53 | bwd_allreduce: 1497.48 | step: 182.14 39%|███▊ | 475/1230 [9:20:09<14:44:00, 70.25s/it] {'loss': 1.2206, 'learning_rate': 1.4053729666192064e-05, 'epoch': 0.39} 39%|███▊ | 475/1230 [9:20:09<14:44:00, 70.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4025 [2024-07-31 11:32:11,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.62 | bwd_microstep: 5071.19 | bwd_inner_microstep: 5052.12 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3839 [2024-07-31 11:32:20,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.73 | bwd_microstep: 5087.54 | bwd_inner_microstep: 5065.42 | bwd_allreduce_microstep: 22.05 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3815 [2024-07-31 11:32:29,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.93 | bwd_microstep: 5036.68 | bwd_inner_microstep: 5017.36 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-31 11:32:38,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.28 | bwd_microstep: 5206.89 | bwd_inner_microstep: 4803.08 | bwd_allreduce_microstep: 403.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 11:32:46,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.31 | bwd_microstep: 5009.10 | bwd_inner_microstep: 4953.25 | bwd_allreduce_microstep: 55.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 11:32:55,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.44 | bwd_microstep: 4918.67 | bwd_inner_microstep: 4893.51 | bwd_allreduce_microstep: 25.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-31 11:33:04,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.38 | bwd_microstep: 5070.91 | bwd_inner_microstep: 5026.89 | bwd_allreduce_microstep: 43.95 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 11:33:12,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-31 11:33:12,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3160.57 | bwd_microstep: 4917.12 | bwd_inner_microstep: 4536.62 | bwd_allreduce_microstep: 380.44 | step_microstep: 181.63 [2024-07-31 11:33:12,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28736.15 | bwd: 40318.08 | bwd_inner: 39348.19 | bwd_allreduce: 969.40 | step: 182.33 39%|███▊ | 476/1230 [9:21:18<14:39:34, 69.99s/it] {'loss': 1.1147, 'learning_rate': 1.4029642786295452e-05, 'epoch': 0.39} 39%|███▊ | 476/1230 [9:21:18<14:39:34, 69.99s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2801 [2024-07-31 11:33:21,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.80 | bwd_microstep: 5437.33 | bwd_inner_microstep: 5020.15 | bwd_allreduce_microstep: 417.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2232 [2024-07-31 11:33:30,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.85 | bwd_microstep: 5247.37 | bwd_inner_microstep: 4843.31 | bwd_allreduce_microstep: 403.99 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3852 [2024-07-31 11:33:39,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.64 | bwd_microstep: 4995.40 | bwd_inner_microstep: 4976.02 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 11:33:47,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.57 | bwd_microstep: 5144.84 | bwd_inner_microstep: 5065.64 | bwd_allreduce_microstep: 79.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-31 11:33:56,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.89 | bwd_microstep: 5090.36 | bwd_inner_microstep: 5046.39 | bwd_allreduce_microstep: 43.90 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 11:34:05,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.19 | bwd_microstep: 5131.01 | bwd_inner_microstep: 5065.65 | bwd_allreduce_microstep: 65.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-31 11:34:13,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.67 | bwd_microstep: 5058.52 | bwd_inner_microstep: 4664.11 | bwd_allreduce_microstep: 394.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 11:34:22,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 11:34:22,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.77 | bwd_microstep: 4991.79 | bwd_inner_microstep: 4972.38 | bwd_allreduce_microstep: 19.34 | step_microstep: 182.72 [2024-07-31 11:34:22,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28655.30 | bwd: 41096.60 | bwd_inner: 39653.60 | bwd_allreduce: 1442.51 | step: 183.31 39%|███▉ | 477/1230 [9:22:28<14:38:44, 70.02s/it] {'loss': 1.1892, 'learning_rate': 1.400552796261866e-05, 'epoch': 0.39} 39%|███▉ | 477/1230 [9:22:28<14:38:44, 70.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4030 [2024-07-31 11:34:31,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.64 | bwd_microstep: 5212.98 | bwd_inner_microstep: 5180.99 | bwd_allreduce_microstep: 31.92 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2809 [2024-07-31 11:34:40,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.34 | bwd_microstep: 5268.58 | bwd_inner_microstep: 4858.23 | bwd_allreduce_microstep: 410.29 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3777 [2024-07-31 11:34:49,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.36 | bwd_microstep: 5083.07 | bwd_inner_microstep: 5055.97 | bwd_allreduce_microstep: 27.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 11:34:58,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.54 | bwd_microstep: 5111.24 | bwd_inner_microstep: 5038.57 | bwd_allreduce_microstep: 72.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-31 11:35:06,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.11 | bwd_microstep: 5040.58 | bwd_inner_microstep: 4646.46 | bwd_allreduce_microstep: 394.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 11:35:15,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.97 | bwd_microstep: 5062.09 | bwd_inner_microstep: 5001.17 | bwd_allreduce_microstep: 60.85 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2119 [2024-07-31 11:35:24,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.37 | bwd_microstep: 5081.97 | bwd_inner_microstep: 4689.62 | bwd_allreduce_microstep: 392.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-31 11:35:33,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.80 [2024-07-31 11:35:33,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.43 | bwd_microstep: 5001.08 | bwd_inner_microstep: 4981.79 | bwd_allreduce_microstep: 19.22 | step_microstep: 181.50 [2024-07-31 11:35:33,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29141.66 | bwd: 40861.58 | bwd_inner: 39452.74 | bwd_allreduce: 1408.36 | step: 182.09 39%|███▉ | 478/1230 [9:23:38<14:38:45, 70.11s/it] {'loss': 1.2071, 'learning_rate': 1.3981385362387268e-05, 'epoch': 0.39} 39%|███▉ | 478/1230 [9:23:38<14:38:45, 70.11s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4072 [2024-07-31 11:35:42,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.84 | bwd_microstep: 5342.54 | bwd_inner_microstep: 5309.16 | bwd_allreduce_microstep: 33.32 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3824 [2024-07-31 11:35:51,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.30 | bwd_microstep: 5249.92 | bwd_inner_microstep: 5170.50 | bwd_allreduce_microstep: 79.36 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3635 [2024-07-31 11:35:59,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.97 | bwd_microstep: 5247.27 | bwd_inner_microstep: 5134.56 | bwd_allreduce_microstep: 112.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-31 11:36:08,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.21 | bwd_microstep: 5195.27 | bwd_inner_microstep: 4790.46 | bwd_allreduce_microstep: 404.75 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-31 11:36:17,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.81 | bwd_microstep: 5238.60 | bwd_inner_microstep: 4831.45 | bwd_allreduce_microstep: 407.08 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2190 [2024-07-31 11:36:26,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.78 | bwd_microstep: 5275.32 | bwd_inner_microstep: 4864.81 | bwd_allreduce_microstep: 410.44 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2211 [2024-07-31 11:36:35,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.39 | bwd_microstep: 5174.55 | bwd_inner_microstep: 4772.33 | bwd_allreduce_microstep: 402.14 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3663 [2024-07-31 11:36:43,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 11:36:43,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.50 | bwd_microstep: 5057.32 | bwd_inner_microstep: 4976.66 | bwd_allreduce_microstep: 80.59 | step_microstep: 181.57 [2024-07-31 11:36:43,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28756.69 | bwd: 41780.77 | bwd_inner: 39849.86 | bwd_allreduce: 1930.41 | step: 182.17 39%|███▉ | 479/1230 [9:24:49<14:40:24, 70.34s/it] {'loss': 1.2227, 'learning_rate': 1.3957215153019462e-05, 'epoch': 0.39} 39%|███▉ | 479/1230 [9:24:49<14:40:24, 70.34s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4087 [2024-07-31 11:36:53,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3867.98 | bwd_microstep: 5374.72 | bwd_inner_microstep: 5355.67 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-31 11:37:01,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.61 | bwd_microstep: 5210.34 | bwd_inner_microstep: 4804.59 | bwd_allreduce_microstep: 405.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2235 [2024-07-31 11:37:10,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3067.68 | bwd_microstep: 5025.62 | bwd_inner_microstep: 4638.42 | bwd_allreduce_microstep: 387.12 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 11:37:18,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.74 | bwd_microstep: 4987.65 | bwd_inner_microstep: 4949.04 | bwd_allreduce_microstep: 38.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-31 11:37:27,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.06 | bwd_microstep: 5167.81 | bwd_inner_microstep: 5111.65 | bwd_allreduce_microstep: 56.09 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3661 [2024-07-31 11:37:36,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.79 | bwd_microstep: 5176.56 | bwd_inner_microstep: 5109.10 | bwd_allreduce_microstep: 67.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-31 11:37:45,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.77 | bwd_microstep: 4996.20 | bwd_inner_microstep: 4966.28 | bwd_allreduce_microstep: 29.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3670 [2024-07-31 11:37:53,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 11:37:53,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.56 | bwd_microstep: 4871.36 | bwd_inner_microstep: 4852.07 | bwd_allreduce_microstep: 19.21 | step_microstep: 181.22 [2024-07-31 11:37:53,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28739.11 | bwd: 40810.23 | bwd_inner: 39786.77 | bwd_allreduce: 1022.97 | step: 181.92 39%|███▉ | 480/1230 [9:25:59<14:37:31, 70.20s/it] {'loss': 1.1625, 'learning_rate': 1.3933017502124897e-05, 'epoch': 0.39} 39%|███▉ | 480/1230 [9:25:59<14:37:31, 70.20s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-07-31 11:38:03,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.14 | bwd_microstep: 5615.70 | bwd_inner_microstep: 5507.84 | bwd_allreduce_microstep: 107.79 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3577 [2024-07-31 11:38:12,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.67 | bwd_microstep: 5236.09 | bwd_inner_microstep: 5139.52 | bwd_allreduce_microstep: 96.50 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2844 [2024-07-31 11:38:21,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.11 | bwd_microstep: 5311.45 | bwd_inner_microstep: 4897.11 | bwd_allreduce_microstep: 414.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 11:38:29,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.17 | bwd_microstep: 5123.95 | bwd_inner_microstep: 5070.60 | bwd_allreduce_microstep: 53.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 11:38:38,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.90 | bwd_microstep: 5131.13 | bwd_inner_microstep: 5052.06 | bwd_allreduce_microstep: 79.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-31 11:38:47,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.24 | bwd_microstep: 5021.61 | bwd_inner_microstep: 5002.33 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 11:38:56,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.56 | bwd_microstep: 5215.07 | bwd_inner_microstep: 4810.07 | bwd_allreduce_microstep: 404.93 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 11:39:05,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 11:39:05,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.42 | bwd_microstep: 5188.65 | bwd_inner_microstep: 5113.56 | bwd_allreduce_microstep: 75.02 | step_microstep: 183.03 [2024-07-31 11:39:05,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29126.08 | bwd: 41843.64 | bwd_inner: 40593.02 | bwd_allreduce: 1250.13 | step: 183.62 39%|███▉ | 481/1230 [9:27:10<14:40:29, 70.53s/it] {'loss': 1.1982, 'learning_rate': 1.3908792577503514e-05, 'epoch': 0.39} 39%|███▉ | 481/1230 [9:27:10<14:40:29, 70.53s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3854 [2024-07-31 11:39:13,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3374.24 | bwd_microstep: 5253.34 | bwd_inner_microstep: 5184.04 | bwd_allreduce_microstep: 69.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3811 [2024-07-31 11:39:22,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3792.80 | bwd_microstep: 5111.52 | bwd_inner_microstep: 5083.30 | bwd_allreduce_microstep: 28.15 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-31 11:39:31,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.12 | bwd_microstep: 5183.54 | bwd_inner_microstep: 5101.13 | bwd_allreduce_microstep: 82.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-31 11:39:40,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.73 | bwd_microstep: 5268.95 | bwd_inner_microstep: 4861.16 | bwd_allreduce_microstep: 407.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 11:39:49,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.25 | bwd_microstep: 5070.20 | bwd_inner_microstep: 5003.93 | bwd_allreduce_microstep: 66.20 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-31 11:39:57,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.14 | bwd_microstep: 4878.82 | bwd_inner_microstep: 4859.42 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 11:40:05,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3014.45 | bwd_microstep: 4901.20 | bwd_inner_microstep: 4525.74 | bwd_allreduce_microstep: 375.39 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2140 [2024-07-31 11:40:14,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 11:40:14,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.22 | bwd_microstep: 5064.45 | bwd_inner_microstep: 4670.13 | bwd_allreduce_microstep: 394.25 | step_microstep: 182.84 [2024-07-31 11:40:14,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28141.85 | bwd: 40732.00 | bwd_inner: 39288.78 | bwd_allreduce: 1442.73 | step: 183.45 39%|███▉ | 482/1230 [9:28:20<14:34:21, 70.14s/it] {'loss': 1.2355, 'learning_rate': 1.3884540547144393e-05, 'epoch': 0.39} 39%|███▉ | 482/1230 [9:28:20<14:34:21, 70.14s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 11:40:23,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3844.79 | bwd_microstep: 5370.66 | bwd_inner_microstep: 5351.58 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3779 [2024-07-31 11:40:32,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.13 | bwd_microstep: 5092.22 | bwd_inner_microstep: 5066.97 | bwd_allreduce_microstep: 25.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-31 11:40:40,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3199.78 | bwd_microstep: 4780.71 | bwd_inner_microstep: 4743.06 | bwd_allreduce_microstep: 37.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 11:40:49,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.36 | bwd_microstep: 4980.35 | bwd_inner_microstep: 4961.01 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 11:40:57,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.40 | bwd_microstep: 4979.22 | bwd_inner_microstep: 4930.09 | bwd_allreduce_microstep: 49.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 11:41:06,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.16 | bwd_microstep: 5049.05 | bwd_inner_microstep: 4990.77 | bwd_allreduce_microstep: 58.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 11:41:14,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3480.72 | bwd_microstep: 5072.96 | bwd_inner_microstep: 4679.19 | bwd_allreduce_microstep: 393.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-31 11:41:23,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 11:41:23,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3480.65 | bwd_microstep: 5059.19 | bwd_inner_microstep: 4664.13 | bwd_allreduce_microstep: 394.99 | step_microstep: 181.54 [2024-07-31 11:41:23,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28614.90 | bwd: 40384.36 | bwd_inner: 39386.74 | bwd_allreduce: 997.12 | step: 182.24 39%|███▉ | 483/1230 [9:29:29<14:30:11, 69.89s/it] {'loss': 1.1543, 'learning_rate': 1.3860261579224574e-05, 'epoch': 0.39} 39%|███▉ | 483/1230 [9:29:29<14:30:11, 69.89s/it]dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3919 [2024-07-31 11:41:32,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.15 | bwd_microstep: 5199.59 | bwd_inner_microstep: 5175.94 | bwd_allreduce_microstep: 23.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3792 [2024-07-31 11:41:41,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.76 | bwd_microstep: 5310.46 | bwd_inner_microstep: 5242.58 | bwd_allreduce_microstep: 67.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 11:41:50,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.40 | bwd_microstep: 5238.50 | bwd_inner_microstep: 5158.23 | bwd_allreduce_microstep: 80.19 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2116 [2024-07-31 11:41:59,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.21 | bwd_microstep: 5174.65 | bwd_inner_microstep: 4772.03 | bwd_allreduce_microstep: 402.56 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2240 [2024-07-31 11:42:08,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.13 | bwd_microstep: 5219.82 | bwd_inner_microstep: 4813.90 | bwd_allreduce_microstep: 405.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 11:42:16,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.38 | bwd_microstep: 4978.37 | bwd_inner_microstep: 4959.03 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2164 [2024-07-31 11:42:25,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.55 | bwd_microstep: 5171.46 | bwd_inner_microstep: 4769.54 | bwd_allreduce_microstep: 401.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 11:42:34,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 11:42:34,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.78 | bwd_microstep: 5115.89 | bwd_inner_microstep: 4716.65 | bwd_allreduce_microstep: 399.17 | step_microstep: 181.72 [2024-07-31 11:42:34,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28886.28 | bwd: 41408.71 | bwd_inner: 39607.84 | bwd_allreduce: 1800.39 | step: 182.32 39%|███▉ | 484/1230 [9:30:40<14:31:44, 70.11s/it] {'loss': 1.1948, 'learning_rate': 1.3835955842107903e-05, 'epoch': 0.39} 39%|███▉ | 484/1230 [9:30:40<14:31:44, 70.11s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3890 [2024-07-31 11:42:43,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.93 | bwd_microstep: 5183.18 | bwd_inner_microstep: 5139.97 | bwd_allreduce_microstep: 43.14 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 11:42:51,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.90 | bwd_microstep: 5221.35 | bwd_inner_microstep: 5133.48 | bwd_allreduce_microstep: 87.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2274 [2024-07-31 11:43:00,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3048.27 | bwd_microstep: 5043.74 | bwd_inner_microstep: 4652.41 | bwd_allreduce_microstep: 391.26 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 11:43:08,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.12 | bwd_microstep: 5202.18 | bwd_inner_microstep: 5140.38 | bwd_allreduce_microstep: 61.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 11:43:17,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.45 | bwd_microstep: 5090.09 | bwd_inner_microstep: 5028.04 | bwd_allreduce_microstep: 61.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-31 11:43:26,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.32 | bwd_microstep: 5158.74 | bwd_inner_microstep: 5104.88 | bwd_allreduce_microstep: 53.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 11:43:35,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.05 | bwd_microstep: 5029.27 | bwd_inner_microstep: 4992.27 | bwd_allreduce_microstep: 36.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3670 [2024-07-31 11:43:43,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 11:43:43,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.26 | bwd_microstep: 4932.49 | bwd_inner_microstep: 4904.77 | bwd_allreduce_microstep: 27.65 | step_microstep: 181.84 [2024-07-31 11:43:43,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28517.21 | bwd: 40861.03 | bwd_inner: 40096.14 | bwd_allreduce: 764.40 | step: 182.43 39%|███▉ | 485/1230 [9:31:49<14:29:04, 69.99s/it] {'loss': 1.2269, 'learning_rate': 1.3811623504343845e-05, 'epoch': 0.39} 39%|███▉ | 485/1230 [9:31:49<14:29:04, 69.99s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3858 [2024-07-31 11:43:52,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3814.15 | bwd_microstep: 5147.65 | bwd_inner_microstep: 5128.51 | bwd_allreduce_microstep: 19.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3866 [2024-07-31 11:44:01,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.80 | bwd_microstep: 5115.18 | bwd_inner_microstep: 5095.79 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3824 [2024-07-31 11:44:10,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.89 | bwd_microstep: 5051.93 | bwd_inner_microstep: 5028.24 | bwd_allreduce_microstep: 23.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-31 11:44:19,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.11 | bwd_microstep: 5162.95 | bwd_inner_microstep: 5087.19 | bwd_allreduce_microstep: 75.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 11:44:28,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.29 | bwd_microstep: 5191.82 | bwd_inner_microstep: 5114.53 | bwd_allreduce_microstep: 77.23 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 11:44:36,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.33 | bwd_microstep: 4996.77 | bwd_inner_microstep: 4941.27 | bwd_allreduce_microstep: 55.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 11:44:45,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.79 | bwd_microstep: 5091.79 | bwd_inner_microstep: 4696.21 | bwd_allreduce_microstep: 395.51 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 11:44:54,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 11:44:54,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.58 | bwd_microstep: 5000.85 | bwd_inner_microstep: 4968.52 | bwd_allreduce_microstep: 32.27 | step_microstep: 181.78 [2024-07-31 11:44:54,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29236.84 | bwd: 40758.93 | bwd_inner: 40060.19 | bwd_allreduce: 698.24 | step: 182.48 40%|███▉ | 486/1230 [9:33:00<14:29:10, 70.09s/it] {'loss': 1.2357, 'learning_rate': 1.3787264734666348e-05, 'epoch': 0.4} 40%|███▉ | 486/1230 [9:33:00<14:29:10, 70.09s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3896 [2024-07-31 11:45:03,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3878.33 | bwd_microstep: 5454.72 | bwd_inner_microstep: 5390.36 | bwd_allreduce_microstep: 64.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3863 [2024-07-31 11:45:12,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.82 | bwd_microstep: 5110.70 | bwd_inner_microstep: 5091.29 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 11:45:21,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.08 | bwd_microstep: 5160.11 | bwd_inner_microstep: 5122.96 | bwd_allreduce_microstep: 37.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 11:45:30,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.84 | bwd_microstep: 5124.43 | bwd_inner_microstep: 5056.27 | bwd_allreduce_microstep: 68.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 11:45:38,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3039.28 | bwd_microstep: 5006.99 | bwd_inner_microstep: 4620.11 | bwd_allreduce_microstep: 386.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 11:45:47,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.50 | bwd_microstep: 5149.57 | bwd_inner_microstep: 5079.34 | bwd_allreduce_microstep: 70.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-31 11:45:55,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.87 | bwd_microstep: 4791.17 | bwd_inner_microstep: 4771.81 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 11:46:04,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 11:46:04,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.51 | bwd_microstep: 5064.17 | bwd_inner_microstep: 5005.42 | bwd_allreduce_microstep: 58.69 | step_microstep: 189.21 [2024-07-31 11:46:04,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28486.13 | bwd: 40861.85 | bwd_inner: 40137.49 | bwd_allreduce: 723.88 | step: 189.79 40%|███▉ | 487/1230 [9:34:09<14:26:29, 69.97s/it] {'loss': 1.1814, 'learning_rate': 1.3762879701992642e-05, 'epoch': 0.4} 40%|███▉ | 487/1230 [9:34:09<14:26:29, 69.97s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2399 [2024-07-31 11:46:13,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.31 | bwd_microstep: 5668.09 | bwd_inner_microstep: 5233.86 | bwd_allreduce_microstep: 434.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3999 [2024-07-31 11:46:22,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3810.52 | bwd_microstep: 5231.43 | bwd_inner_microstep: 5212.17 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 11:46:31,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.84 | bwd_microstep: 5184.31 | bwd_inner_microstep: 5099.17 | bwd_allreduce_microstep: 85.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-31 11:46:39,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.19 | bwd_microstep: 5133.66 | bwd_inner_microstep: 5058.29 | bwd_allreduce_microstep: 75.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 11:46:48,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.95 | bwd_microstep: 4980.84 | bwd_inner_microstep: 4961.55 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 11:46:57,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.78 | bwd_microstep: 4902.63 | bwd_inner_microstep: 4883.26 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 11:47:06,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.93 | bwd_microstep: 5080.95 | bwd_inner_microstep: 5042.13 | bwd_allreduce_microstep: 38.75 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 11:47:14,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 11:47:14,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.31 | bwd_microstep: 5110.85 | bwd_inner_microstep: 4713.84 | bwd_allreduce_microstep: 396.94 | step_microstep: 182.33 [2024-07-31 11:47:14,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29267.73 | bwd: 41292.74 | bwd_inner: 40204.21 | bwd_allreduce: 1088.03 | step: 182.94 40%|███▉ | 488/1230 [9:35:20<14:28:45, 70.25s/it] {'loss': 1.1582, 'learning_rate': 1.373846857542208e-05, 'epoch': 0.4} 40%|███▉ | 488/1230 [9:35:20<14:28:45, 70.25s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 11:47:24,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3880.69 | bwd_microstep: 5369.91 | bwd_inner_microstep: 5350.77 | bwd_allreduce_microstep: 19.07 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3888 [2024-07-31 11:47:33,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3781.70 | bwd_microstep: 5109.58 | bwd_inner_microstep: 5090.26 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3797 [2024-07-31 11:47:41,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.34 | bwd_microstep: 5022.68 | bwd_inner_microstep: 4984.10 | bwd_allreduce_microstep: 38.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 11:47:50,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.38 | bwd_microstep: 5012.52 | bwd_inner_microstep: 4972.38 | bwd_allreduce_microstep: 40.07 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3671 [2024-07-31 11:47:59,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.33 | bwd_microstep: 5178.86 | bwd_inner_microstep: 5089.56 | bwd_allreduce_microstep: 89.24 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2155 [2024-07-31 11:48:07,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.92 | bwd_microstep: 5219.69 | bwd_inner_microstep: 4815.59 | bwd_allreduce_microstep: 404.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3693 [2024-07-31 11:48:15,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3197.98 | bwd_microstep: 4705.65 | bwd_inner_microstep: 4683.84 | bwd_allreduce_microstep: 21.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 11:48:24,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 11:48:24,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.49 | bwd_microstep: 4908.37 | bwd_inner_microstep: 4884.94 | bwd_allreduce_microstep: 23.35 | step_microstep: 182.69 [2024-07-31 11:48:24,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28924.72 | bwd: 40527.25 | bwd_inner: 39871.37 | bwd_allreduce: 655.37 | step: 183.38 40%|███▉ | 489/1230 [9:36:30<14:25:51, 70.11s/it] {'loss': 1.1656, 'learning_rate': 1.3714031524234965e-05, 'epoch': 0.4} 40%|███▉ | 489/1230 [9:36:30<14:25:51, 70.11s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3955 [2024-07-31 11:48:34,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3861.95 | bwd_microstep: 5438.12 | bwd_inner_microstep: 5384.37 | bwd_allreduce_microstep: 53.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3577 [2024-07-31 11:48:42,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.45 | bwd_microstep: 5288.61 | bwd_inner_microstep: 5185.92 | bwd_allreduce_microstep: 102.62 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3577 [2024-07-31 11:48:51,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3103.51 | bwd_microstep: 4927.36 | bwd_inner_microstep: 4869.16 | bwd_allreduce_microstep: 58.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-31 11:48:59,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.88 | bwd_microstep: 5123.07 | bwd_inner_microstep: 5052.96 | bwd_allreduce_microstep: 70.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 11:49:08,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.05 | bwd_microstep: 4965.41 | bwd_inner_microstep: 4936.80 | bwd_allreduce_microstep: 28.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 11:49:17,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.62 | bwd_microstep: 4890.88 | bwd_inner_microstep: 4871.54 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2193 [2024-07-31 11:49:25,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.10 | bwd_microstep: 5050.39 | bwd_inner_microstep: 4660.86 | bwd_allreduce_microstep: 389.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 11:49:33,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 11:49:33,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3190.65 | bwd_microstep: 4738.48 | bwd_inner_microstep: 4707.77 | bwd_allreduce_microstep: 30.64 | step_microstep: 181.43 [2024-07-31 11:49:33,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28261.10 | bwd: 40422.30 | bwd_inner: 39669.31 | bwd_allreduce: 752.51 | step: 182.00 40%|███▉ | 490/1230 [9:37:39<14:20:38, 69.78s/it] {'loss': 1.174, 'learning_rate': 1.368956871789138e-05, 'epoch': 0.4} 40%|███▉ | 490/1230 [9:37:39<14:20:38, 69.78s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-31 11:49:42,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.69 | bwd_microstep: 5355.04 | bwd_inner_microstep: 5256.68 | bwd_allreduce_microstep: 98.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-31 11:49:50,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3234.20 | bwd_microstep: 4936.28 | bwd_inner_microstep: 4882.53 | bwd_allreduce_microstep: 53.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 11:49:59,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.43 | bwd_microstep: 5124.17 | bwd_inner_microstep: 5049.89 | bwd_allreduce_microstep: 74.21 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3678 [2024-07-31 11:50:08,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.32 | bwd_microstep: 5196.32 | bwd_inner_microstep: 5105.72 | bwd_allreduce_microstep: 90.54 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3693 [2024-07-31 11:50:17,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.29 | bwd_microstep: 5031.34 | bwd_inner_microstep: 4960.59 | bwd_allreduce_microstep: 70.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3677 [2024-07-31 11:50:25,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.27 | bwd_microstep: 4972.52 | bwd_inner_microstep: 4937.01 | bwd_allreduce_microstep: 35.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 11:50:34,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.31 | bwd_microstep: 4924.21 | bwd_inner_microstep: 4898.86 | bwd_allreduce_microstep: 25.27 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 11:50:43,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 11:50:43,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.13 | bwd_microstep: 4991.93 | bwd_inner_microstep: 4597.30 | bwd_allreduce_microstep: 394.56 | step_microstep: 182.55 [2024-07-31 11:50:43,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28735.55 | bwd: 40531.79 | bwd_inner: 39688.51 | bwd_allreduce: 842.79 | step: 183.14 40%|███▉ | 491/1230 [9:38:49<14:18:48, 69.73s/it] {'loss': 1.136, 'learning_rate': 1.3665080326030001e-05, 'epoch': 0.4} 40%|███▉ | 491/1230 [9:38:49<14:18:48, 69.73s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3933 [2024-07-31 11:50:52,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.03 | bwd_microstep: 5132.02 | bwd_inner_microstep: 5101.89 | bwd_allreduce_microstep: 30.05 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3577 [2024-07-31 11:51:01,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.16 | bwd_microstep: 5271.39 | bwd_inner_microstep: 5118.19 | bwd_allreduce_microstep: 153.14 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3820 [2024-07-31 11:51:10,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3780.95 | bwd_microstep: 5168.59 | bwd_inner_microstep: 5134.67 | bwd_allreduce_microstep: 33.84 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3592 [2024-07-31 11:51:18,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.60 | bwd_microstep: 5131.28 | bwd_inner_microstep: 5043.31 | bwd_allreduce_microstep: 87.89 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 11:51:27,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.77 | bwd_microstep: 5144.73 | bwd_inner_microstep: 5064.29 | bwd_allreduce_microstep: 80.37 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3659 [2024-07-31 11:51:36,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.78 | bwd_microstep: 5199.21 | bwd_inner_microstep: 5101.14 | bwd_allreduce_microstep: 98.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 11:51:45,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.22 | bwd_microstep: 4991.47 | bwd_inner_microstep: 4972.15 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3770 [2024-07-31 11:51:54,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 11:51:54,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.98 | bwd_microstep: 5177.89 | bwd_inner_microstep: 5124.93 | bwd_allreduce_microstep: 52.89 | step_microstep: 181.36 [2024-07-31 11:51:54,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29262.40 | bwd: 41216.55 | bwd_inner: 40660.49 | bwd_allreduce: 555.53 | step: 182.05 40%|████ | 492/1230 [9:39:59<14:21:38, 70.05s/it] {'loss': 1.2108, 'learning_rate': 1.364056651846693e-05, 'epoch': 0.4} 40%|████ | 492/1230 [9:40:00<14:21:38, 70.05s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 11:52:03,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.91 | bwd_microstep: 5410.11 | bwd_inner_microstep: 5301.70 | bwd_allreduce_microstep: 108.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3578 [2024-07-31 11:52:12,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.08 | bwd_microstep: 5321.75 | bwd_inner_microstep: 5218.10 | bwd_allreduce_microstep: 103.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-31 11:52:20,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.89 | bwd_microstep: 5117.50 | bwd_inner_microstep: 5046.28 | bwd_allreduce_microstep: 71.15 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 11:52:29,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.54 | bwd_microstep: 5077.52 | bwd_inner_microstep: 5034.36 | bwd_allreduce_microstep: 43.10 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-31 11:52:37,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.19 | bwd_microstep: 4940.73 | bwd_inner_microstep: 4560.20 | bwd_allreduce_microstep: 380.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 11:52:45,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3023.01 | bwd_microstep: 4988.46 | bwd_inner_microstep: 4603.16 | bwd_allreduce_microstep: 385.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2143 [2024-07-31 11:52:54,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.60 | bwd_microstep: 5110.37 | bwd_inner_microstep: 4714.50 | bwd_allreduce_microstep: 395.80 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 11:53:03,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 11:53:03,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.77 | bwd_microstep: 5038.05 | bwd_inner_microstep: 4983.83 | bwd_allreduce_microstep: 54.15 | step_microstep: 183.20 [2024-07-31 11:53:03,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27768.90 | bwd: 41004.48 | bwd_inner: 39462.07 | bwd_allreduce: 1541.93 | step: 183.80 40%|████ | 493/1230 [9:41:09<14:16:59, 69.77s/it] {'loss': 1.2016, 'learning_rate': 1.3616027465194525e-05, 'epoch': 0.4} 40%|████ | 493/1230 [9:41:09<14:16:59, 69.77s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4072 [2024-07-31 11:53:12,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.31 | bwd_microstep: 5195.61 | bwd_inner_microstep: 5173.08 | bwd_allreduce_microstep: 22.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3864 [2024-07-31 11:53:20,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.86 | bwd_microstep: 5066.97 | bwd_inner_microstep: 5033.17 | bwd_allreduce_microstep: 33.73 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2215 [2024-07-31 11:53:29,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.70 | bwd_microstep: 5226.28 | bwd_inner_microstep: 4819.41 | bwd_allreduce_microstep: 406.80 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3670 [2024-07-31 11:53:38,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.89 | bwd_microstep: 5034.59 | bwd_inner_microstep: 4992.07 | bwd_allreduce_microstep: 42.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-31 11:53:47,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.39 | bwd_microstep: 5181.93 | bwd_inner_microstep: 5128.66 | bwd_allreduce_microstep: 53.20 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 11:53:55,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.35 | bwd_microstep: 5121.72 | bwd_inner_microstep: 5050.40 | bwd_allreduce_microstep: 71.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 11:54:04,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.62 | bwd_microstep: 4924.33 | bwd_inner_microstep: 4897.65 | bwd_allreduce_microstep: 26.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 11:54:13,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 11:54:13,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.76 | bwd_microstep: 5152.44 | bwd_inner_microstep: 5082.06 | bwd_allreduce_microstep: 70.31 | step_microstep: 182.95 [2024-07-31 11:54:13,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29137.79 | bwd: 40903.85 | bwd_inner: 40176.45 | bwd_allreduce: 726.92 | step: 183.55 40%|████ | 494/1230 [9:42:19<14:18:04, 69.95s/it] {'loss': 1.2034, 'learning_rate': 1.35914633363802e-05, 'epoch': 0.4} 40%|████ | 494/1230 [9:42:19<14:18:04, 69.95s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 11:54:22,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3873.80 | bwd_microstep: 5406.50 | bwd_inner_microstep: 5380.84 | bwd_allreduce_microstep: 25.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2304 [2024-07-31 11:54:31,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.51 | bwd_microstep: 5393.82 | bwd_inner_microstep: 4977.10 | bwd_allreduce_microstep: 416.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-31 11:54:40,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.30 | bwd_microstep: 5239.11 | bwd_inner_microstep: 5146.97 | bwd_allreduce_microstep: 92.07 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3817 [2024-07-31 11:54:49,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.74 | bwd_microstep: 5193.00 | bwd_inner_microstep: 5155.40 | bwd_allreduce_microstep: 37.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3750 [2024-07-31 11:54:58,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.80 | bwd_microstep: 5163.92 | bwd_inner_microstep: 5079.66 | bwd_allreduce_microstep: 84.18 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-31 11:55:07,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.56 | bwd_microstep: 5074.67 | bwd_inner_microstep: 4680.78 | bwd_allreduce_microstep: 393.82 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 11:55:15,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.68 | bwd_microstep: 5084.25 | bwd_inner_microstep: 4691.61 | bwd_allreduce_microstep: 392.56 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1115 [2024-07-31 11:55:24,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-31 11:55:24,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3453.89 | bwd_microstep: 5104.98 | bwd_inner_microstep: 4716.90 | bwd_allreduce_microstep: 388.02 | step_microstep: 181.72 [2024-07-31 11:55:24,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28823.17 | bwd: 41660.24 | bwd_inner: 39829.19 | bwd_allreduce: 1830.53 | step: 182.42 40%|████ | 495/1230 [9:43:30<14:20:05, 70.21s/it] {'loss': 1.1206, 'learning_rate': 1.3566874302365262e-05, 'epoch': 0.4} 40%|████ | 495/1230 [9:43:30<14:20:05, 70.21s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3574 [2024-07-31 11:55:33,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.45 | bwd_microstep: 5191.24 | bwd_inner_microstep: 5099.80 | bwd_allreduce_microstep: 91.37 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3804 [2024-07-31 11:55:42,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.04 | bwd_microstep: 5024.46 | bwd_inner_microstep: 5004.98 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-31 11:55:50,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.20 | bwd_microstep: 5029.29 | bwd_inner_microstep: 5006.11 | bwd_allreduce_microstep: 23.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 11:55:59,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.90 | bwd_microstep: 5160.83 | bwd_inner_microstep: 4759.14 | bwd_allreduce_microstep: 401.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 11:56:08,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.41 | bwd_microstep: 5024.82 | bwd_inner_microstep: 4981.83 | bwd_allreduce_microstep: 42.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 11:56:17,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.77 | bwd_microstep: 5264.46 | bwd_inner_microstep: 4856.80 | bwd_allreduce_microstep: 407.59 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2103 [2024-07-31 11:56:25,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.59 | bwd_microstep: 5042.39 | bwd_inner_microstep: 4651.55 | bwd_allreduce_microstep: 390.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 11:56:33,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 11:56:33,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.72 | bwd_microstep: 4818.33 | bwd_inner_microstep: 4781.27 | bwd_allreduce_microstep: 37.00 | step_microstep: 182.14 [2024-07-31 11:56:33,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28618.96 | bwd: 40555.79 | bwd_inner: 39141.42 | bwd_allreduce: 1413.90 | step: 182.72 40%|████ | 496/1230 [9:44:39<14:16:19, 70.00s/it] {'loss': 1.1982, 'learning_rate': 1.3542260533663723e-05, 'epoch': 0.4} 40%|████ | 496/1230 [9:44:39<14:16:19, 70.00s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3921 [2024-07-31 11:56:43,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.49 | bwd_microstep: 5533.01 | bwd_inner_microstep: 5443.33 | bwd_allreduce_microstep: 89.61 | step_microstep: 0.10 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3791 [2024-07-31 11:56:52,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.87 | bwd_microstep: 5207.38 | bwd_inner_microstep: 5166.05 | bwd_allreduce_microstep: 41.26 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2260 [2024-07-31 11:57:00,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3469.57 | bwd_microstep: 5026.26 | bwd_inner_microstep: 4632.08 | bwd_allreduce_microstep: 394.10 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3766 [2024-07-31 11:57:09,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.66 | bwd_microstep: 5061.81 | bwd_inner_microstep: 5035.06 | bwd_allreduce_microstep: 26.68 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2863 [2024-07-31 11:57:18,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.90 | bwd_microstep: 5112.51 | bwd_inner_microstep: 4714.57 | bwd_allreduce_microstep: 397.86 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 11:57:26,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.10 | bwd_microstep: 4879.09 | bwd_inner_microstep: 4859.72 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 11:57:35,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.21 | bwd_microstep: 4981.91 | bwd_inner_microstep: 4962.54 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-31 11:57:44,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 11:57:44,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.12 | bwd_microstep: 4925.47 | bwd_inner_microstep: 4899.87 | bwd_allreduce_microstep: 25.54 | step_microstep: 182.58 [2024-07-31 11:57:44,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29239.83 | bwd: 40727.43 | bwd_inner: 39713.17 | bwd_allreduce: 1013.75 | step: 183.30 40%|████ | 497/1230 [9:45:50<14:16:16, 70.09s/it] {'loss': 1.1445, 'learning_rate': 1.351762220096112e-05, 'epoch': 0.4} 40%|████ | 497/1230 [9:45:50<14:16:16, 70.09s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-31 11:57:53,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3783.29 | bwd_microstep: 5183.13 | bwd_inner_microstep: 5146.12 | bwd_allreduce_microstep: 36.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 11:58:02,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.13 | bwd_microstep: 5365.22 | bwd_inner_microstep: 5286.07 | bwd_allreduce_microstep: 79.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3920 [2024-07-31 11:58:11,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3805.36 | bwd_microstep: 5153.22 | bwd_inner_microstep: 5133.91 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 11:58:20,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.33 | bwd_microstep: 5002.94 | bwd_inner_microstep: 4983.59 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 11:58:28,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.43 | bwd_microstep: 5120.08 | bwd_inner_microstep: 5052.43 | bwd_allreduce_microstep: 67.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 11:58:37,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.71 | bwd_microstep: 4916.32 | bwd_inner_microstep: 4896.90 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3699 [2024-07-31 11:58:45,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.37 | bwd_microstep: 5041.37 | bwd_inner_microstep: 4965.62 | bwd_allreduce_microstep: 75.68 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2921 [2024-07-31 11:58:54,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 11:58:54,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.48 | bwd_microstep: 5102.36 | bwd_inner_microstep: 4739.28 | bwd_allreduce_microstep: 363.01 | step_microstep: 181.57 [2024-07-31 11:58:54,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29391.00 | bwd: 40884.63 | bwd_inner: 40203.86 | bwd_allreduce: 680.28 | step: 182.15 40%|████ | 498/1230 [9:47:00<14:17:00, 70.25s/it] {'loss': 1.1877, 'learning_rate': 1.3492959475113332e-05, 'epoch': 0.4} 40%|████ | 498/1230 [9:47:00<14:17:00, 70.25s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3905 [2024-07-31 11:59:04,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3864.16 | bwd_microstep: 5393.24 | bwd_inner_microstep: 5343.75 | bwd_allreduce_microstep: 49.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3849 [2024-07-31 11:59:12,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.92 | bwd_microstep: 5110.52 | bwd_inner_microstep: 5071.33 | bwd_allreduce_microstep: 39.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3658 [2024-07-31 11:59:21,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.23 | bwd_microstep: 4961.82 | bwd_inner_microstep: 4927.76 | bwd_allreduce_microstep: 33.98 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 11:59:30,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.49 | bwd_microstep: 5147.08 | bwd_inner_microstep: 5092.28 | bwd_allreduce_microstep: 54.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-31 11:59:39,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.99 | bwd_microstep: 5013.46 | bwd_inner_microstep: 4994.11 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 11:59:48,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3891.64 | bwd_microstep: 5046.91 | bwd_inner_microstep: 4982.38 | bwd_allreduce_microstep: 64.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 11:59:56,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.08 | bwd_microstep: 5129.31 | bwd_inner_microstep: 5061.34 | bwd_allreduce_microstep: 67.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 12:00:05,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 12:00:05,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.84 | bwd_microstep: 5158.03 | bwd_inner_microstep: 4756.14 | bwd_allreduce_microstep: 401.83 | step_microstep: 181.57 [2024-07-31 12:00:05,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29549.25 | bwd: 40960.34 | bwd_inner: 40229.04 | bwd_allreduce: 730.80 | step: 182.26 41%|████ | 499/1230 [9:48:11<14:18:01, 70.43s/it] {'loss': 1.1683, 'learning_rate': 1.3468272527145388e-05, 'epoch': 0.41} 41%|████ | 499/1230 [9:48:11<14:18:01, 70.43s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4071 [2024-07-31 12:00:14,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3862.56 | bwd_microstep: 5403.46 | bwd_inner_microstep: 5384.31 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 12:00:23,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.58 | bwd_microstep: 5171.70 | bwd_inner_microstep: 5126.25 | bwd_allreduce_microstep: 45.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-31 12:00:32,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.98 | bwd_microstep: 5246.15 | bwd_inner_microstep: 5183.69 | bwd_allreduce_microstep: 62.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3794 [2024-07-31 12:00:41,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.60 | bwd_microstep: 5018.07 | bwd_inner_microstep: 4998.72 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 12:00:50,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.41 | bwd_microstep: 5181.52 | bwd_inner_microstep: 5109.34 | bwd_allreduce_microstep: 72.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 12:00:58,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.66 | bwd_microstep: 5106.77 | bwd_inner_microstep: 5058.78 | bwd_allreduce_microstep: 47.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 12:01:07,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.34 | bwd_microstep: 5085.73 | bwd_inner_microstep: 4693.03 | bwd_allreduce_microstep: 392.64 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2045 [2024-07-31 12:01:16,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 12:01:16,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.47 | bwd_microstep: 5227.91 | bwd_inner_microstep: 4821.76 | bwd_allreduce_microstep: 406.08 | step_microstep: 183.11 [2024-07-31 12:01:16,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29100.50 | bwd: 41441.29 | bwd_inner: 40375.82 | bwd_allreduce: 1064.98 | step: 183.69 41%|████ | 500/1230 [9:49:22<14:18:30, 70.56s/it] {'loss': 1.1368, 'learning_rate': 1.3443561528250295e-05, 'epoch': 0.41} 41%|████ | 500/1230 [9:49:22<14:18:30, 70.56s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4096 [2024-07-31 12:01:25,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.73 | bwd_microstep: 5254.93 | bwd_inner_microstep: 5235.82 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-31 12:01:34,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3398.19 | bwd_microstep: 5179.25 | bwd_inner_microstep: 5127.11 | bwd_allreduce_microstep: 52.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 12:01:43,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.45 | bwd_microstep: 5218.36 | bwd_inner_microstep: 5130.40 | bwd_allreduce_microstep: 87.89 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3701 [2024-07-31 12:01:51,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.62 | bwd_microstep: 5096.46 | bwd_inner_microstep: 5013.22 | bwd_allreduce_microstep: 83.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 12:02:00,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.62 | bwd_microstep: 5000.17 | bwd_inner_microstep: 4942.32 | bwd_allreduce_microstep: 57.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 12:02:08,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3443.95 | bwd_microstep: 5021.46 | bwd_inner_microstep: 4633.58 | bwd_allreduce_microstep: 387.80 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 12:02:17,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.07 | bwd_microstep: 5215.94 | bwd_inner_microstep: 5130.30 | bwd_allreduce_microstep: 85.57 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 12:02:26,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 12:02:26,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.46 | bwd_microstep: 5058.92 | bwd_inner_microstep: 4999.79 | bwd_allreduce_microstep: 59.06 | step_microstep: 182.11 [2024-07-31 12:02:26,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28589.00 | bwd: 41045.46 | bwd_inner: 40212.48 | bwd_allreduce: 832.49 | step: 182.72 41%|████ | 501/1230 [9:50:32<14:15:11, 70.39s/it] {'loss': 1.154, 'learning_rate': 1.3418826649787834e-05, 'epoch': 0.41} 41%|████ | 501/1230 [9:50:32<14:15:11, 70.39s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2463 [2024-07-31 12:02:35,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.77 | bwd_microstep: 5391.00 | bwd_inner_microstep: 4977.60 | bwd_allreduce_microstep: 413.33 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3856 [2024-07-31 12:02:44,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3443.69 | bwd_microstep: 5015.66 | bwd_inner_microstep: 4988.01 | bwd_allreduce_microstep: 27.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3807 [2024-07-31 12:02:52,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.39 | bwd_microstep: 5163.54 | bwd_inner_microstep: 5113.24 | bwd_allreduce_microstep: 50.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-31 12:03:01,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.80 | bwd_microstep: 5152.07 | bwd_inner_microstep: 5097.03 | bwd_allreduce_microstep: 54.96 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 12:03:10,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.18 | bwd_microstep: 5177.52 | bwd_inner_microstep: 5097.91 | bwd_allreduce_microstep: 79.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-31 12:03:19,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.99 | bwd_microstep: 5070.36 | bwd_inner_microstep: 5026.57 | bwd_allreduce_microstep: 43.72 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2129 [2024-07-31 12:03:27,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.42 | bwd_microstep: 5121.62 | bwd_inner_microstep: 4723.93 | bwd_allreduce_microstep: 397.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 12:03:36,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 12:03:36,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.29 | bwd_microstep: 5030.64 | bwd_inner_microstep: 4973.04 | bwd_allreduce_microstep: 57.54 | step_microstep: 181.84 [2024-07-31 12:03:36,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28610.43 | bwd: 41122.38 | bwd_inner: 39997.25 | bwd_allreduce: 1124.62 | step: 182.54 41%|████ | 502/1230 [9:51:42<14:12:50, 70.29s/it] {'loss': 1.177, 'learning_rate': 1.3394068063283387e-05, 'epoch': 0.41} 41%|████ | 502/1230 [9:51:42<14:12:50, 70.29s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 12:03:45,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.01 | bwd_microstep: 5204.81 | bwd_inner_microstep: 5185.72 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3573 [2024-07-31 12:03:54,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.81 | bwd_microstep: 5138.84 | bwd_inner_microstep: 5055.77 | bwd_allreduce_microstep: 83.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 12:04:03,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.77 | bwd_microstep: 5189.27 | bwd_inner_microstep: 5107.70 | bwd_allreduce_microstep: 81.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3627 [2024-07-31 12:04:12,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.85 | bwd_microstep: 5266.88 | bwd_inner_microstep: 5149.82 | bwd_allreduce_microstep: 116.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2259 [2024-07-31 12:04:20,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3465.01 | bwd_microstep: 5110.94 | bwd_inner_microstep: 4712.31 | bwd_allreduce_microstep: 398.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 12:04:29,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.78 | bwd_microstep: 5024.84 | bwd_inner_microstep: 4986.65 | bwd_allreduce_microstep: 38.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 12:04:38,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.64 | bwd_microstep: 5111.46 | bwd_inner_microstep: 5045.69 | bwd_allreduce_microstep: 65.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 12:04:47,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 12:04:47,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.67 | bwd_microstep: 5012.51 | bwd_inner_microstep: 4972.02 | bwd_allreduce_microstep: 40.42 | step_microstep: 182.54 [2024-07-31 12:04:47,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29083.44 | bwd: 41059.53 | bwd_inner: 40215.60 | bwd_allreduce: 843.43 | step: 183.12 41%|████ | 503/1230 [9:52:52<14:12:21, 70.35s/it] {'loss': 1.1707, 'learning_rate': 1.3369285940426737e-05, 'epoch': 0.41} 41%|████ | 503/1230 [9:52:52<14:12:21, 70.35s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3952 [2024-07-31 12:04:56,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3845.36 | bwd_microstep: 5186.35 | bwd_inner_microstep: 5166.92 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3856 [2024-07-31 12:05:05,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.25 | bwd_microstep: 5467.44 | bwd_inner_microstep: 5400.73 | bwd_allreduce_microstep: 66.64 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2059 [2024-07-31 12:05:14,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.91 | bwd_microstep: 5212.03 | bwd_inner_microstep: 4806.77 | bwd_allreduce_microstep: 405.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-07-31 12:05:22,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.29 | bwd_microstep: 4984.60 | bwd_inner_microstep: 4965.23 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 12:05:31,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.09 | bwd_microstep: 5131.54 | bwd_inner_microstep: 5061.59 | bwd_allreduce_microstep: 69.87 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3638 [2024-07-31 12:05:40,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.73 | bwd_microstep: 4942.64 | bwd_inner_microstep: 4908.37 | bwd_allreduce_microstep: 34.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 12:05:48,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.28 | bwd_microstep: 5023.72 | bwd_inner_microstep: 4960.81 | bwd_allreduce_microstep: 62.84 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 12:05:57,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 12:05:57,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.44 | bwd_microstep: 4997.57 | bwd_inner_microstep: 4944.40 | bwd_allreduce_microstep: 53.10 | step_microstep: 181.77 [2024-07-31 12:05:57,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29203.25 | bwd: 40945.88 | bwd_inner: 40214.78 | bwd_allreduce: 730.60 | step: 182.48 41%|████ | 504/1230 [9:54:03<14:11:41, 70.39s/it] {'loss': 1.1674, 'learning_rate': 1.3344480453070882e-05, 'epoch': 0.41} 41%|████ | 504/1230 [9:54:03<14:11:41, 70.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 12:06:06,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3401.97 | bwd_microstep: 5210.18 | bwd_inner_microstep: 5136.12 | bwd_allreduce_microstep: 73.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 12:06:14,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.56 | bwd_microstep: 4983.47 | bwd_inner_microstep: 4964.16 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-31 12:06:23,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3799.12 | bwd_microstep: 5121.06 | bwd_inner_microstep: 5082.82 | bwd_allreduce_microstep: 38.18 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-07-31 12:06:32,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.10 | bwd_microstep: 5110.49 | bwd_inner_microstep: 4712.71 | bwd_allreduce_microstep: 397.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3820 [2024-07-31 12:06:41,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.96 | bwd_microstep: 5046.28 | bwd_inner_microstep: 5026.92 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 12:06:50,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.07 | bwd_microstep: 4991.66 | bwd_inner_microstep: 4972.31 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-31 12:06:59,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.14 | bwd_microstep: 5393.34 | bwd_inner_microstep: 4926.96 | bwd_allreduce_microstep: 466.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 12:07:07,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.83 [2024-07-31 12:07:07,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.46 | bwd_microstep: 4875.10 | bwd_inner_microstep: 4855.71 | bwd_allreduce_microstep: 19.32 | step_microstep: 181.77 [2024-07-31 12:07:07,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29156.28 | bwd: 40731.56 | bwd_inner: 39677.64 | bwd_allreduce: 1053.43 | step: 182.35 41%|████ | 505/1230 [9:55:13<14:09:54, 70.34s/it] {'loss': 1.2047, 'learning_rate': 1.331965177323084e-05, 'epoch': 0.41} 41%|████ | 505/1230 [9:55:13<14:09:54, 70.34s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 12:07:17,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3852.74 | bwd_microstep: 5703.04 | bwd_inner_microstep: 5683.90 | bwd_allreduce_microstep: 19.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3828 [2024-07-31 12:07:26,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.72 | bwd_microstep: 5063.18 | bwd_inner_microstep: 5041.57 | bwd_allreduce_microstep: 21.54 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2253 [2024-07-31 12:07:35,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.78 | bwd_microstep: 5232.45 | bwd_inner_microstep: 4824.75 | bwd_allreduce_microstep: 407.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 12:07:43,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.83 | bwd_microstep: 5118.85 | bwd_inner_microstep: 5040.01 | bwd_allreduce_microstep: 78.77 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2198 [2024-07-31 12:07:52,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.00 | bwd_microstep: 5161.43 | bwd_inner_microstep: 4759.95 | bwd_allreduce_microstep: 401.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-31 12:08:01,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.84 | bwd_microstep: 5229.10 | bwd_inner_microstep: 4822.79 | bwd_allreduce_microstep: 406.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3648 [2024-07-31 12:08:09,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.54 | bwd_microstep: 5017.37 | bwd_inner_microstep: 4944.10 | bwd_allreduce_microstep: 73.21 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2157 [2024-07-31 12:08:17,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 12:08:17,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3028.27 | bwd_microstep: 4904.09 | bwd_inner_microstep: 4526.71 | bwd_allreduce_microstep: 377.32 | step_microstep: 181.66 [2024-07-31 12:08:17,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28442.64 | bwd: 41429.48 | bwd_inner: 39643.71 | bwd_allreduce: 1785.28 | step: 182.24 41%|████ | 506/1230 [9:56:23<14:08:14, 70.30s/it] {'loss': 1.1635, 'learning_rate': 1.3294800073082465e-05, 'epoch': 0.41} 41%|████ | 506/1230 [9:56:23<14:08:14, 70.30s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3985 [2024-07-31 12:08:26,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.42 | bwd_microstep: 5123.99 | bwd_inner_microstep: 5101.62 | bwd_allreduce_microstep: 22.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 12:08:35,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.35 | bwd_microstep: 5321.79 | bwd_inner_microstep: 5212.78 | bwd_allreduce_microstep: 108.95 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3751 [2024-07-31 12:08:43,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3110.68 | bwd_microstep: 4956.14 | bwd_inner_microstep: 4917.77 | bwd_allreduce_microstep: 38.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 12:08:52,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.08 | bwd_microstep: 5207.26 | bwd_inner_microstep: 4803.40 | bwd_allreduce_microstep: 403.80 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3845 [2024-07-31 12:09:01,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.44 | bwd_microstep: 4860.70 | bwd_inner_microstep: 4839.27 | bwd_allreduce_microstep: 21.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 12:09:09,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.78 | bwd_microstep: 5063.31 | bwd_inner_microstep: 5005.80 | bwd_allreduce_microstep: 57.43 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3635 [2024-07-31 12:09:17,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3041.50 | bwd_microstep: 4787.56 | bwd_inner_microstep: 4747.26 | bwd_allreduce_microstep: 40.23 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2134 [2024-07-31 12:09:26,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 12:09:26,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3474.69 | bwd_microstep: 5045.01 | bwd_inner_microstep: 4653.70 | bwd_allreduce_microstep: 391.24 | step_microstep: 183.11 [2024-07-31 12:09:26,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27568.85 | bwd: 40365.75 | bwd_inner: 39281.53 | bwd_allreduce: 1083.73 | step: 183.72 41%|████ | 507/1230 [9:57:32<13:59:43, 69.69s/it] {'loss': 1.1921, 'learning_rate': 1.3269925524961237e-05, 'epoch': 0.41} 41%|████ | 507/1230 [9:57:32<13:59:43, 69.69s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3958 [2024-07-31 12:09:35,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.84 | bwd_microstep: 5247.85 | bwd_inner_microstep: 5213.66 | bwd_allreduce_microstep: 34.11 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3910 [2024-07-31 12:09:44,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.73 | bwd_microstep: 5327.14 | bwd_inner_microstep: 5254.29 | bwd_allreduce_microstep: 72.77 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3878 [2024-07-31 12:09:53,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.51 | bwd_microstep: 5123.65 | bwd_inner_microstep: 5068.90 | bwd_allreduce_microstep: 54.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3855 [2024-07-31 12:10:01,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.48 | bwd_microstep: 5120.96 | bwd_inner_microstep: 5101.59 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 12:10:10,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.86 | bwd_microstep: 5066.30 | bwd_inner_microstep: 5004.51 | bwd_allreduce_microstep: 61.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 12:10:19,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.21 | bwd_microstep: 5179.20 | bwd_inner_microstep: 4778.46 | bwd_allreduce_microstep: 400.67 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2147 [2024-07-31 12:10:27,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.50 | bwd_microstep: 5103.35 | bwd_inner_microstep: 4707.09 | bwd_allreduce_microstep: 396.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-31 12:10:36,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 12:10:36,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.26 | bwd_microstep: 5093.10 | bwd_inner_microstep: 4697.15 | bwd_allreduce_microstep: 395.88 | step_microstep: 183.32 [2024-07-31 12:10:36,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28913.26 | bwd: 41261.50 | bwd_inner: 39825.61 | bwd_allreduce: 1435.42 | step: 184.03 41%|████▏ | 508/1230 [9:58:42<14:01:33, 69.94s/it] {'loss': 1.1486, 'learning_rate': 1.3245028301361086e-05, 'epoch': 0.41} 41%|████▏ | 508/1230 [9:58:42<14:01:33, 69.94s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 12:10:45,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.06 | bwd_microstep: 5229.18 | bwd_inner_microstep: 5209.53 | bwd_allreduce_microstep: 19.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-31 12:10:54,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.57 | bwd_microstep: 5069.07 | bwd_inner_microstep: 5039.84 | bwd_allreduce_microstep: 29.16 | step_microstep: 0.08 dynamic ViT batch size: 13, images per sample: 6.5, dynamic token length: 2828 [2024-07-31 12:11:03,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.75 | bwd_microstep: 5254.22 | bwd_inner_microstep: 4846.26 | bwd_allreduce_microstep: 407.90 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-31 12:11:12,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.78 | bwd_microstep: 4997.38 | bwd_inner_microstep: 4978.00 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-31 12:11:21,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.93 | bwd_microstep: 5169.58 | bwd_inner_microstep: 5114.05 | bwd_allreduce_microstep: 55.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-31 12:11:29,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.69 | bwd_microstep: 5192.11 | bwd_inner_microstep: 4786.82 | bwd_allreduce_microstep: 405.23 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2168 [2024-07-31 12:11:37,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3025.84 | bwd_microstep: 4934.27 | bwd_inner_microstep: 4557.76 | bwd_allreduce_microstep: 376.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 12:11:46,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 12:11:46,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.34 | bwd_microstep: 5002.87 | bwd_inner_microstep: 4949.54 | bwd_allreduce_microstep: 53.26 | step_microstep: 181.90 [2024-07-31 12:11:46,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28582.85 | bwd: 40848.66 | bwd_inner: 39481.74 | bwd_allreduce: 1366.45 | step: 182.48 41%|████▏ | 509/1230 [9:59:52<13:59:46, 69.88s/it] {'loss': 1.1625, 'learning_rate': 1.3220108574933185e-05, 'epoch': 0.41} 41%|████▏ | 509/1230 [9:59:52<13:59:46, 69.88s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3874 [2024-07-31 12:11:55,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3783.20 | bwd_microstep: 5202.73 | bwd_inner_microstep: 5172.19 | bwd_allreduce_microstep: 30.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3839 [2024-07-31 12:12:04,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.75 | bwd_microstep: 5037.38 | bwd_inner_microstep: 5018.08 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2234 [2024-07-31 12:12:12,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3032.68 | bwd_microstep: 5022.82 | bwd_inner_microstep: 4635.78 | bwd_allreduce_microstep: 386.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 12:12:21,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.36 | bwd_microstep: 4919.60 | bwd_inner_microstep: 4894.26 | bwd_allreduce_microstep: 25.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-31 12:12:29,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.98 | bwd_microstep: 4982.97 | bwd_inner_microstep: 4950.00 | bwd_allreduce_microstep: 32.90 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3692 [2024-07-31 12:12:38,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.31 | bwd_microstep: 5057.50 | bwd_inner_microstep: 4989.84 | bwd_allreduce_microstep: 67.60 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2664 [2024-07-31 12:12:47,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.22 | bwd_microstep: 5087.78 | bwd_inner_microstep: 4691.42 | bwd_allreduce_microstep: 396.29 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3674 [2024-07-31 12:12:55,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 12:12:55,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.05 | bwd_microstep: 5077.88 | bwd_inner_microstep: 4992.89 | bwd_allreduce_microstep: 84.93 | step_microstep: 182.00 [2024-07-31 12:12:55,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28721.43 | bwd: 40388.67 | bwd_inner: 39344.41 | bwd_allreduce: 1043.79 | step: 182.59 41%|████▏ | 510/1230 [10:01:01<13:57:01, 69.75s/it] {'loss': 1.2083, 'learning_rate': 1.3195166518484748e-05, 'epoch': 0.41} 41%|████▏ | 510/1230 [10:01:01<13:57:01, 69.75s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3987 [2024-07-31 12:13:05,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.72 | bwd_microstep: 5563.04 | bwd_inner_microstep: 5490.90 | bwd_allreduce_microstep: 72.07 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2055 [2024-07-31 12:13:14,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.88 | bwd_microstep: 5481.37 | bwd_inner_microstep: 5057.82 | bwd_allreduce_microstep: 423.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2263 [2024-07-31 12:13:23,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.72 | bwd_microstep: 5268.69 | bwd_inner_microstep: 4857.97 | bwd_allreduce_microstep: 410.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3786 [2024-07-31 12:13:31,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.92 | bwd_microstep: 5025.89 | bwd_inner_microstep: 5006.54 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-31 12:13:40,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.50 | bwd_microstep: 5018.45 | bwd_inner_microstep: 4977.56 | bwd_allreduce_microstep: 40.83 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2101 [2024-07-31 12:13:49,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3451.90 | bwd_microstep: 5044.95 | bwd_inner_microstep: 4652.50 | bwd_allreduce_microstep: 392.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 12:13:57,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3394.23 | bwd_microstep: 4985.20 | bwd_inner_microstep: 4938.30 | bwd_allreduce_microstep: 46.83 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-31 12:14:06,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 12:14:06,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.97 | bwd_microstep: 4980.56 | bwd_inner_microstep: 4961.18 | bwd_allreduce_microstep: 19.31 | step_microstep: 182.85 [2024-07-31 12:14:06,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28758.74 | bwd: 41368.14 | bwd_inner: 39942.70 | bwd_allreduce: 1424.91 | step: 183.55 42%|████▏ | 511/1230 [10:02:12<13:58:25, 69.97s/it] {'loss': 1.1531, 'learning_rate': 1.317020230497784e-05, 'epoch': 0.42} 42%|████▏ | 511/1230 [10:02:12<13:58:25, 69.97s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 12:14:15,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.33 | bwd_microstep: 5429.01 | bwd_inner_microstep: 5322.60 | bwd_allreduce_microstep: 106.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2260 [2024-07-31 12:14:23,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3051.34 | bwd_microstep: 5043.85 | bwd_inner_microstep: 4652.98 | bwd_allreduce_microstep: 390.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-31 12:14:31,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.58 | bwd_microstep: 4846.52 | bwd_inner_microstep: 4797.91 | bwd_allreduce_microstep: 48.54 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 12:14:40,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.55 | bwd_microstep: 5124.05 | bwd_inner_microstep: 5051.07 | bwd_allreduce_microstep: 72.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-31 12:14:49,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.50 | bwd_microstep: 5044.40 | bwd_inner_microstep: 4986.74 | bwd_allreduce_microstep: 57.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-31 12:14:57,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.93 | bwd_microstep: 5006.89 | bwd_inner_microstep: 4974.24 | bwd_allreduce_microstep: 32.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 12:15:06,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.78 | bwd_microstep: 4980.46 | bwd_inner_microstep: 4932.59 | bwd_allreduce_microstep: 47.80 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3667 [2024-07-31 12:15:15,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 12:15:15,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.27 | bwd_microstep: 5059.40 | bwd_inner_microstep: 4982.21 | bwd_allreduce_microstep: 77.12 | step_microstep: 181.41 [2024-07-31 12:15:15,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28005.18 | bwd: 40534.56 | bwd_inner: 39700.27 | bwd_allreduce: 833.80 | step: 182.00 42%|████▏ | 512/1230 [10:03:21<13:53:19, 69.64s/it] {'loss': 1.1074, 'learning_rate': 1.3145216107528178e-05, 'epoch': 0.42} 42%|████▏ | 512/1230 [10:03:21<13:53:19, 69.64s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3918 [2024-07-31 12:15:24,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.79 | bwd_microstep: 5301.84 | bwd_inner_microstep: 5246.97 | bwd_allreduce_microstep: 54.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-31 12:15:33,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.79 | bwd_microstep: 5242.75 | bwd_inner_microstep: 5146.33 | bwd_allreduce_microstep: 96.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 12:15:41,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3265.87 | bwd_microstep: 4918.54 | bwd_inner_microstep: 4894.89 | bwd_allreduce_microstep: 23.58 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2219 [2024-07-31 12:15:49,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2987.57 | bwd_microstep: 4832.82 | bwd_inner_microstep: 4459.74 | bwd_allreduce_microstep: 373.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-31 12:15:57,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3181.30 | bwd_microstep: 4680.24 | bwd_inner_microstep: 4654.45 | bwd_allreduce_microstep: 25.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-31 12:16:05,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.83 | bwd_microstep: 5192.93 | bwd_inner_microstep: 4790.45 | bwd_allreduce_microstep: 402.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 12:16:14,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.85 | bwd_microstep: 4977.92 | bwd_inner_microstep: 4946.37 | bwd_allreduce_microstep: 31.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 12:16:22,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 12:16:22,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3185.48 | bwd_microstep: 4684.71 | bwd_inner_microstep: 4665.23 | bwd_allreduce_microstep: 19.42 | step_microstep: 182.21 [2024-07-31 12:16:22,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27170.39 | bwd: 39831.74 | bwd_inner: 38804.38 | bwd_allreduce: 1026.88 | step: 182.79 42%|████▏ | 513/1230 [10:04:28<13:43:53, 68.95s/it] {'loss': 1.2205, 'learning_rate': 1.3120208099403928e-05, 'epoch': 0.42} 42%|████▏ | 513/1230 [10:04:28<13:43:53, 68.95s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2493 [2024-07-31 12:16:31,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.76 | bwd_microstep: 5312.84 | bwd_inner_microstep: 4901.43 | bwd_allreduce_microstep: 411.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 12:16:40,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.24 | bwd_microstep: 5264.32 | bwd_inner_microstep: 4856.34 | bwd_allreduce_microstep: 407.91 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2772 [2024-07-31 12:16:49,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.91 | bwd_microstep: 5429.23 | bwd_inner_microstep: 5008.97 | bwd_allreduce_microstep: 420.18 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 12:16:58,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.57 | bwd_microstep: 5136.96 | bwd_inner_microstep: 5068.00 | bwd_allreduce_microstep: 68.89 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2190 [2024-07-31 12:17:07,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.60 | bwd_microstep: 5249.52 | bwd_inner_microstep: 4842.53 | bwd_allreduce_microstep: 406.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 12:17:15,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.88 | bwd_microstep: 5125.06 | bwd_inner_microstep: 5072.57 | bwd_allreduce_microstep: 52.41 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-31 12:17:24,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.55 | bwd_microstep: 5103.20 | bwd_inner_microstep: 4706.81 | bwd_allreduce_microstep: 396.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2132 [2024-07-31 12:17:33,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 12:17:33,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.68 | bwd_microstep: 5219.24 | bwd_inner_microstep: 4814.50 | bwd_allreduce_microstep: 404.67 | step_microstep: 183.10 [2024-07-31 12:17:33,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28767.11 | bwd: 41840.37 | bwd_inner: 39271.10 | bwd_allreduce: 2568.76 | step: 183.70 42%|████▏ | 514/1230 [10:05:39<13:49:52, 69.54s/it] {'loss': 1.181, 'learning_rate': 1.3095178454024499e-05, 'epoch': 0.42} 42%|████▏ | 514/1230 [10:05:39<13:49:52, 69.54s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3957 [2024-07-31 12:17:42,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.69 | bwd_microstep: 5525.81 | bwd_inner_microstep: 5447.25 | bwd_allreduce_microstep: 78.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-07-31 12:17:51,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.37 | bwd_microstep: 4930.39 | bwd_inner_microstep: 4866.98 | bwd_allreduce_microstep: 63.33 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3775 [2024-07-31 12:17:59,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.57 | bwd_microstep: 5119.34 | bwd_inner_microstep: 5070.44 | bwd_allreduce_microstep: 48.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 12:18:08,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.35 | bwd_microstep: 5098.07 | bwd_inner_microstep: 4703.58 | bwd_allreduce_microstep: 394.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 12:18:16,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3246.99 | bwd_microstep: 4858.91 | bwd_inner_microstep: 4816.78 | bwd_allreduce_microstep: 42.05 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 12:18:24,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.03 | bwd_microstep: 4719.31 | bwd_inner_microstep: 4696.91 | bwd_allreduce_microstep: 22.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-31 12:18:33,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.39 | bwd_microstep: 4963.48 | bwd_inner_microstep: 4932.83 | bwd_allreduce_microstep: 30.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-31 12:18:41,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 12:18:41,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.00 | bwd_microstep: 5130.55 | bwd_inner_microstep: 4731.96 | bwd_allreduce_microstep: 398.52 | step_microstep: 181.56 [2024-07-31 12:18:41,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27662.28 | bwd: 40345.82 | bwd_inner: 39266.67 | bwd_allreduce: 1078.67 | step: 182.28 42%|████▏ | 515/1230 [10:06:47<13:44:24, 69.18s/it] {'loss': 1.1736, 'learning_rate': 1.3070127344959348e-05, 'epoch': 0.42} 42%|████▏ | 515/1230 [10:06:47<13:44:24, 69.18s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3831 [2024-07-31 12:18:50,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.52 | bwd_microstep: 5232.01 | bwd_inner_microstep: 5181.58 | bwd_allreduce_microstep: 50.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3946 [2024-07-31 12:18:59,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3792.77 | bwd_microstep: 5184.34 | bwd_inner_microstep: 5165.04 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3799 [2024-07-31 12:19:08,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.28 | bwd_microstep: 5087.62 | bwd_inner_microstep: 5054.60 | bwd_allreduce_microstep: 32.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 12:19:16,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3204.33 | bwd_microstep: 4759.51 | bwd_inner_microstep: 4724.08 | bwd_allreduce_microstep: 35.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-31 12:19:24,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3041.16 | bwd_microstep: 4945.60 | bwd_inner_microstep: 4566.17 | bwd_allreduce_microstep: 379.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 12:19:33,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.66 | bwd_microstep: 4932.78 | bwd_inner_microstep: 4907.12 | bwd_allreduce_microstep: 25.59 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2168 [2024-07-31 12:19:41,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3440.57 | bwd_microstep: 5036.28 | bwd_inner_microstep: 4647.34 | bwd_allreduce_microstep: 388.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 12:19:49,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 12:19:49,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.50 | bwd_microstep: 4832.49 | bwd_inner_microstep: 4795.53 | bwd_allreduce_microstep: 36.88 | step_microstep: 182.64 [2024-07-31 12:19:49,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27729.69 | bwd: 40010.63 | bwd_inner: 39041.39 | bwd_allreduce: 968.74 | step: 183.21 42%|████▏ | 516/1230 [10:07:55<13:39:17, 68.85s/it] {'loss': 1.2039, 'learning_rate': 1.3045054945926775e-05, 'epoch': 0.42} 42%|████▏ | 516/1230 [10:07:55<13:39:17, 68.85s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2320 [2024-07-31 12:19:59,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.50 | bwd_microstep: 5604.09 | bwd_inner_microstep: 5175.06 | bwd_allreduce_microstep: 428.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3968 [2024-07-31 12:20:08,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.69 | bwd_microstep: 5299.53 | bwd_inner_microstep: 5248.98 | bwd_allreduce_microstep: 50.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-07-31 12:20:17,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.87 | bwd_microstep: 5179.43 | bwd_inner_microstep: 5094.49 | bwd_allreduce_microstep: 84.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3598 [2024-07-31 12:20:25,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.72 | bwd_microstep: 5144.52 | bwd_inner_microstep: 5069.43 | bwd_allreduce_microstep: 75.02 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2184 [2024-07-31 12:20:34,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.60 | bwd_microstep: 5050.96 | bwd_inner_microstep: 4658.22 | bwd_allreduce_microstep: 392.67 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3695 [2024-07-31 12:20:42,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3106.82 | bwd_microstep: 4951.79 | bwd_inner_microstep: 4896.36 | bwd_allreduce_microstep: 55.37 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3673 [2024-07-31 12:20:51,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.31 | bwd_microstep: 5038.70 | bwd_inner_microstep: 4962.96 | bwd_allreduce_microstep: 75.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3651 [2024-07-31 12:20:59,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 12:20:59,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.12 | bwd_microstep: 5001.68 | bwd_inner_microstep: 4928.64 | bwd_allreduce_microstep: 72.97 | step_microstep: 183.40 [2024-07-31 12:20:59,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28277.54 | bwd: 41270.68 | bwd_inner: 40034.08 | bwd_allreduce: 1236.10 | step: 183.99 42%|████▏ | 517/1230 [10:09:05<13:41:49, 69.16s/it] {'loss': 1.2059, 'learning_rate': 1.3019961430792711e-05, 'epoch': 0.42} 42%|████▏ | 517/1230 [10:09:05<13:41:49, 69.16s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 12:21:08,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.66 | bwd_microstep: 5226.06 | bwd_inner_microstep: 5204.27 | bwd_allreduce_microstep: 21.71 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2273 [2024-07-31 12:21:17,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3347.11 | bwd_microstep: 5241.71 | bwd_inner_microstep: 4836.49 | bwd_allreduce_microstep: 405.16 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 1547 [2024-07-31 12:21:26,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.46 | bwd_microstep: 5475.17 | bwd_inner_microstep: 5053.22 | bwd_allreduce_microstep: 421.88 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 12:21:35,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.08 | bwd_microstep: 5229.37 | bwd_inner_microstep: 5139.67 | bwd_allreduce_microstep: 89.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-31 12:21:44,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.60 | bwd_microstep: 4996.36 | bwd_inner_microstep: 4959.11 | bwd_allreduce_microstep: 37.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-31 12:21:52,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.26 | bwd_microstep: 4905.25 | bwd_inner_microstep: 4877.93 | bwd_allreduce_microstep: 27.25 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-31 12:22:00,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.45 | bwd_microstep: 5097.98 | bwd_inner_microstep: 4700.99 | bwd_allreduce_microstep: 396.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 12:22:09,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 12:22:09,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3218.72 | bwd_microstep: 4798.86 | bwd_inner_microstep: 4779.54 | bwd_allreduce_microstep: 19.24 | step_microstep: 181.34 [2024-07-31 12:22:09,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28001.25 | bwd: 40970.73 | bwd_inner: 39551.16 | bwd_allreduce: 1419.07 | step: 182.03 42%|████▏ | 518/1230 [10:10:15<13:41:10, 69.20s/it] {'loss': 1.138, 'learning_rate': 1.2994846973569526e-05, 'epoch': 0.42} 42%|████▏ | 518/1230 [10:10:15<13:41:10, 69.20s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3559 [2024-07-31 12:22:17,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3197.03 | bwd_microstep: 5283.56 | bwd_inner_microstep: 5134.41 | bwd_allreduce_microstep: 149.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3801 [2024-07-31 12:22:26,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.35 | bwd_microstep: 5077.16 | bwd_inner_microstep: 5050.04 | bwd_allreduce_microstep: 27.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3846 [2024-07-31 12:22:35,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3781.16 | bwd_microstep: 5096.61 | bwd_inner_microstep: 5077.28 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-31 12:22:44,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.59 | bwd_microstep: 4974.64 | bwd_inner_microstep: 4955.24 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-31 12:22:52,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.41 | bwd_microstep: 5128.87 | bwd_inner_microstep: 4731.79 | bwd_allreduce_microstep: 397.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 12:23:01,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.32 | bwd_microstep: 5032.77 | bwd_inner_microstep: 4972.82 | bwd_allreduce_microstep: 59.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 12:23:10,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.49 | bwd_microstep: 5072.42 | bwd_inner_microstep: 5008.37 | bwd_allreduce_microstep: 63.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-31 12:23:18,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 12:23:18,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.07 | bwd_microstep: 4904.51 | bwd_inner_microstep: 4879.93 | bwd_allreduce_microstep: 24.52 | step_microstep: 183.00 [2024-07-31 12:23:18,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28833.33 | bwd: 40570.51 | bwd_inner: 39809.81 | bwd_allreduce: 760.21 | step: 183.59 42%|████▏ | 519/1230 [10:11:24<13:41:56, 69.36s/it] {'loss': 1.1806, 'learning_rate': 1.2969711748414804e-05, 'epoch': 0.42} 42%|████▏ | 519/1230 [10:11:24<13:41:56, 69.36s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4044 [2024-07-31 12:23:28,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3868.21 | bwd_microstep: 5390.46 | bwd_inner_microstep: 5371.29 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3934 [2024-07-31 12:23:37,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3800.43 | bwd_microstep: 5165.58 | bwd_inner_microstep: 5146.11 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3826 [2024-07-31 12:23:45,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3395.28 | bwd_microstep: 4998.00 | bwd_inner_microstep: 4971.38 | bwd_allreduce_microstep: 26.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 12:23:54,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.19 | bwd_microstep: 5096.55 | bwd_inner_microstep: 5027.74 | bwd_allreduce_microstep: 68.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-31 12:24:02,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3009.33 | bwd_microstep: 4914.14 | bwd_inner_microstep: 4535.96 | bwd_allreduce_microstep: 378.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 12:24:10,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.39 | bwd_microstep: 5030.89 | bwd_inner_microstep: 5005.20 | bwd_allreduce_microstep: 25.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 12:24:19,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.77 | bwd_microstep: 4902.34 | bwd_inner_microstep: 4883.04 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-31 12:24:28,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.86 [2024-07-31 12:24:28,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.18 | bwd_microstep: 4995.20 | bwd_inner_microstep: 4962.13 | bwd_allreduce_microstep: 33.00 | step_microstep: 183.27 [2024-07-31 12:24:28,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28785.68 | bwd: 40493.14 | bwd_inner: 39902.79 | bwd_allreduce: 589.86 | step: 183.86 42%|████▏ | 520/1230 [10:12:34<13:41:41, 69.44s/it] {'loss': 1.1989, 'learning_rate': 1.2944555929630149e-05, 'epoch': 0.42} 42%|████▏ | 520/1230 [10:12:34<13:41:41, 69.44s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 12:24:37,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.21 | bwd_microstep: 5273.52 | bwd_inner_microstep: 5244.95 | bwd_allreduce_microstep: 28.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3809 [2024-07-31 12:24:46,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.90 | bwd_microstep: 5043.21 | bwd_inner_microstep: 5023.88 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3824 [2024-07-31 12:24:55,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.04 | bwd_microstep: 5055.71 | bwd_inner_microstep: 5036.31 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-31 12:25:04,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.64 | bwd_microstep: 5305.05 | bwd_inner_microstep: 5235.12 | bwd_allreduce_microstep: 69.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-31 12:25:12,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.20 | bwd_microstep: 5166.56 | bwd_inner_microstep: 4763.84 | bwd_allreduce_microstep: 402.65 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 659 [2024-07-31 12:25:21,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.19 | bwd_microstep: 5256.97 | bwd_inner_microstep: 4850.29 | bwd_allreduce_microstep: 406.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 12:25:30,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.31 | bwd_microstep: 5094.76 | bwd_inner_microstep: 4699.93 | bwd_allreduce_microstep: 394.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-31 12:25:39,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 12:25:39,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.20 | bwd_microstep: 5193.89 | bwd_inner_microstep: 5103.09 | bwd_allreduce_microstep: 90.73 | step_microstep: 181.04 [2024-07-31 12:25:39,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29052.59 | bwd: 41389.65 | bwd_inner: 39957.35 | bwd_allreduce: 1431.82 | step: 181.62 42%|████▏ | 521/1230 [10:13:45<13:45:15, 69.84s/it] {'loss': 1.1866, 'learning_rate': 1.291937969165998e-05, 'epoch': 0.42} 42%|████▏ | 521/1230 [10:13:45<13:45:15, 69.84s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3937 [2024-07-31 12:25:48,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.11 | bwd_microstep: 5266.94 | bwd_inner_microstep: 5207.85 | bwd_allreduce_microstep: 59.02 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-31 12:25:57,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.58 | bwd_microstep: 5222.29 | bwd_inner_microstep: 4817.12 | bwd_allreduce_microstep: 405.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2253 [2024-07-31 12:26:05,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.95 | bwd_microstep: 5239.66 | bwd_inner_microstep: 4831.09 | bwd_allreduce_microstep: 408.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-31 12:26:14,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.13 | bwd_microstep: 5057.02 | bwd_inner_microstep: 5015.88 | bwd_allreduce_microstep: 41.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 12:26:22,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.80 | bwd_microstep: 4782.44 | bwd_inner_microstep: 4763.06 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2126 [2024-07-31 12:26:31,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.04 | bwd_microstep: 5226.24 | bwd_inner_microstep: 4819.80 | bwd_allreduce_microstep: 406.36 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 12:26:40,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.86 | bwd_microstep: 5075.40 | bwd_inner_microstep: 5019.35 | bwd_allreduce_microstep: 55.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 12:26:48,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 12:26:48,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.58 | bwd_microstep: 5181.36 | bwd_inner_microstep: 4778.64 | bwd_allreduce_microstep: 402.65 | step_microstep: 181.92 [2024-07-31 12:26:48,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28265.96 | bwd: 41051.33 | bwd_inner: 39252.74 | bwd_allreduce: 1798.09 | step: 182.51 42%|████▏ | 522/1230 [10:14:54<13:43:24, 69.78s/it] {'loss': 1.195, 'learning_rate': 1.2894183209090304e-05, 'epoch': 0.42} 42%|████▏ | 522/1230 [10:14:54<13:43:24, 69.78s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3975 [2024-07-31 12:26:58,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.94 | bwd_microstep: 5337.32 | bwd_inner_microstep: 5293.26 | bwd_allreduce_microstep: 44.00 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-07-31 12:27:06,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.49 | bwd_microstep: 5194.06 | bwd_inner_microstep: 5146.64 | bwd_allreduce_microstep: 47.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3776 [2024-07-31 12:27:15,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.50 | bwd_microstep: 5001.08 | bwd_inner_microstep: 4981.73 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 12:27:24,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.24 | bwd_microstep: 5192.11 | bwd_inner_microstep: 5132.47 | bwd_allreduce_microstep: 59.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-31 12:27:33,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.82 | bwd_microstep: 5164.78 | bwd_inner_microstep: 5109.99 | bwd_allreduce_microstep: 54.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 12:27:41,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.66 | bwd_microstep: 5059.69 | bwd_inner_microstep: 4666.23 | bwd_allreduce_microstep: 393.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 12:27:49,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3046.88 | bwd_microstep: 5023.77 | bwd_inner_microstep: 4639.12 | bwd_allreduce_microstep: 384.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 12:27:58,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 12:27:58,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.83 | bwd_microstep: 5028.18 | bwd_inner_microstep: 4970.56 | bwd_allreduce_microstep: 57.54 | step_microstep: 183.26 [2024-07-31 12:27:58,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28390.27 | bwd: 41000.97 | bwd_inner: 39939.95 | bwd_allreduce: 1060.52 | step: 183.95 43%|████▎ | 523/1230 [10:16:04<13:42:03, 69.76s/it] {'loss': 1.1406, 'learning_rate': 1.2868966656647519e-05, 'epoch': 0.43} 43%|████▎ | 523/1230 [10:16:04<13:42:03, 69.76s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3747 [2024-07-31 12:28:07,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.39 | bwd_microstep: 5370.98 | bwd_inner_microstep: 5273.20 | bwd_allreduce_microstep: 97.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3892 [2024-07-31 12:28:16,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.03 | bwd_microstep: 5241.27 | bwd_inner_microstep: 5187.65 | bwd_allreduce_microstep: 53.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-31 12:28:25,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.46 | bwd_microstep: 5164.82 | bwd_inner_microstep: 5090.92 | bwd_allreduce_microstep: 73.84 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2815 [2024-07-31 12:28:33,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3073.79 | bwd_microstep: 4976.85 | bwd_inner_microstep: 4611.19 | bwd_allreduce_microstep: 365.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3753 [2024-07-31 12:28:42,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.98 | bwd_microstep: 5103.25 | bwd_inner_microstep: 5073.88 | bwd_allreduce_microstep: 29.30 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2162 [2024-07-31 12:28:51,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.30 | bwd_microstep: 5165.53 | bwd_inner_microstep: 4763.95 | bwd_allreduce_microstep: 401.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 12:28:59,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.40 | bwd_microstep: 5068.06 | bwd_inner_microstep: 5005.49 | bwd_allreduce_microstep: 62.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 12:29:08,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 12:29:08,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.16 | bwd_microstep: 4944.58 | bwd_inner_microstep: 4898.35 | bwd_allreduce_microstep: 46.16 | step_microstep: 181.83 [2024-07-31 12:29:08,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28380.41 | bwd: 41035.32 | bwd_inner: 39904.56 | bwd_allreduce: 1130.27 | step: 182.41 43%|████▎ | 524/1230 [10:17:14<13:40:50, 69.76s/it] {'loss': 1.2052, 'learning_rate': 1.2843730209197203e-05, 'epoch': 0.43} 43%|████▎ | 524/1230 [10:17:14<13:40:50, 69.76s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3526 [2024-07-31 12:29:17,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.91 | bwd_microstep: 5353.67 | bwd_inner_microstep: 5204.16 | bwd_allreduce_microstep: 149.44 | step_microstep: 0.12 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3757 [2024-07-31 12:29:26,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.77 | bwd_microstep: 5100.01 | bwd_inner_microstep: 5072.26 | bwd_allreduce_microstep: 27.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 12:29:35,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.42 | bwd_microstep: 5172.85 | bwd_inner_microstep: 5100.95 | bwd_allreduce_microstep: 71.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 12:29:43,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.64 | bwd_microstep: 5179.18 | bwd_inner_microstep: 4778.65 | bwd_allreduce_microstep: 400.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-31 12:29:52,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.40 | bwd_microstep: 5023.86 | bwd_inner_microstep: 4998.25 | bwd_allreduce_microstep: 25.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-31 12:30:01,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.08 | bwd_microstep: 5128.95 | bwd_inner_microstep: 5079.89 | bwd_allreduce_microstep: 48.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-31 12:30:10,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.24 | bwd_microstep: 5094.99 | bwd_inner_microstep: 5050.73 | bwd_allreduce_microstep: 44.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 12:30:18,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 12:30:18,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.62 | bwd_microstep: 5156.56 | bwd_inner_microstep: 5088.15 | bwd_allreduce_microstep: 68.34 | step_microstep: 181.65 [2024-07-31 12:30:18,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29031.98 | bwd: 41210.05 | bwd_inner: 40372.97 | bwd_allreduce: 836.60 | step: 182.26 43%|████▎ | 525/1230 [10:18:24<13:42:32, 70.00s/it] {'loss': 1.196, 'learning_rate': 1.2818474041742885e-05, 'epoch': 0.43} 43%|████▎ | 525/1230 [10:18:24<13:42:32, 70.00s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2352 [2024-07-31 12:30:28,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.36 | bwd_microstep: 5491.66 | bwd_inner_microstep: 5070.79 | bwd_allreduce_microstep: 420.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3591 [2024-07-31 12:30:36,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.55 | bwd_microstep: 5075.43 | bwd_inner_microstep: 5008.80 | bwd_allreduce_microstep: 66.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 12:30:45,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.29 | bwd_microstep: 5003.24 | bwd_inner_microstep: 4980.60 | bwd_allreduce_microstep: 22.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 12:30:53,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3195.33 | bwd_microstep: 4690.43 | bwd_inner_microstep: 4671.06 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-31 12:31:02,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.29 | bwd_microstep: 5093.76 | bwd_inner_microstep: 4696.65 | bwd_allreduce_microstep: 397.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 12:31:10,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.08 | bwd_microstep: 5053.18 | bwd_inner_microstep: 4997.06 | bwd_allreduce_microstep: 56.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 12:31:19,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.93 | bwd_microstep: 5080.72 | bwd_inner_microstep: 5021.24 | bwd_allreduce_microstep: 59.41 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2109 [2024-07-31 12:31:28,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 12:31:28,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.42 | bwd_microstep: 5091.08 | bwd_inner_microstep: 4696.03 | bwd_allreduce_microstep: 394.98 | step_microstep: 182.33 [2024-07-31 12:31:28,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28347.15 | bwd: 40579.46 | bwd_inner: 39142.17 | bwd_allreduce: 1436.79 | step: 182.92 43%|████▎ | 526/1230 [10:19:34<13:38:45, 69.78s/it] {'loss': 1.1633, 'learning_rate': 1.2793198329424858e-05, 'epoch': 0.43} 43%|████▎ | 526/1230 [10:19:34<13:38:45, 69.78s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3805 [2024-07-31 12:31:36,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3283.65 | bwd_microstep: 4934.22 | bwd_inner_microstep: 4902.37 | bwd_allreduce_microstep: 31.77 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2438 [2024-07-31 12:31:45,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.16 | bwd_microstep: 5354.42 | bwd_inner_microstep: 4938.15 | bwd_allreduce_microstep: 416.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-31 12:31:54,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.40 | bwd_microstep: 5174.17 | bwd_inner_microstep: 5135.31 | bwd_allreduce_microstep: 38.79 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 12:32:03,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.38 | bwd_microstep: 4891.90 | bwd_inner_microstep: 4872.53 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3628 [2024-07-31 12:32:11,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.37 | bwd_microstep: 5164.08 | bwd_inner_microstep: 5068.16 | bwd_allreduce_microstep: 95.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-31 12:32:20,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.16 | bwd_microstep: 5055.64 | bwd_inner_microstep: 4663.27 | bwd_allreduce_microstep: 392.29 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 12:32:29,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.24 | bwd_microstep: 5074.82 | bwd_inner_microstep: 5027.01 | bwd_allreduce_microstep: 47.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 12:32:38,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 12:32:38,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.68 | bwd_microstep: 5047.14 | bwd_inner_microstep: 4983.52 | bwd_allreduce_microstep: 63.54 | step_microstep: 182.99 [2024-07-31 12:32:38,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28801.95 | bwd: 40696.37 | bwd_inner: 39590.26 | bwd_allreduce: 1105.61 | step: 183.68 43%|████▎ | 527/1230 [10:20:43<13:37:47, 69.80s/it] {'loss': 1.1886, 'learning_rate': 1.2767903247518943e-05, 'epoch': 0.43} 43%|████▎ | 527/1230 [10:20:43<13:37:47, 69.80s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 12:32:47,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3899.19 | bwd_microstep: 5439.20 | bwd_inner_microstep: 5413.34 | bwd_allreduce_microstep: 25.79 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2265 [2024-07-31 12:32:56,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.31 | bwd_microstep: 5285.54 | bwd_inner_microstep: 4876.61 | bwd_allreduce_microstep: 408.86 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 12:33:04,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.92 | bwd_microstep: 4830.47 | bwd_inner_microstep: 4784.49 | bwd_allreduce_microstep: 45.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-31 12:33:13,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.00 | bwd_microstep: 5212.77 | bwd_inner_microstep: 4807.58 | bwd_allreduce_microstep: 405.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 12:33:21,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.58 | bwd_microstep: 5174.08 | bwd_inner_microstep: 5097.10 | bwd_allreduce_microstep: 76.91 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 646 [2024-07-31 12:33:29,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2982.24 | bwd_microstep: 4963.58 | bwd_inner_microstep: 4585.38 | bwd_allreduce_microstep: 378.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2122 [2024-07-31 12:33:38,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.62 | bwd_microstep: 5094.65 | bwd_inner_microstep: 4699.85 | bwd_allreduce_microstep: 394.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 12:33:46,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 12:33:46,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.03 | bwd_microstep: 4784.12 | bwd_inner_microstep: 4746.68 | bwd_allreduce_microstep: 37.37 | step_microstep: 181.66 [2024-07-31 12:33:46,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27573.79 | bwd: 40784.38 | bwd_inner: 39010.96 | bwd_allreduce: 1772.94 | step: 182.25 43%|████▎ | 528/1230 [10:21:52<13:32:43, 69.46s/it] {'loss': 1.1421, 'learning_rate': 1.2742588971435276e-05, 'epoch': 0.43} 43%|████▎ | 528/1230 [10:21:52<13:32:43, 69.46s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3869 [2024-07-31 12:33:55,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3832.41 | bwd_microstep: 5212.60 | bwd_inner_microstep: 5181.60 | bwd_allreduce_microstep: 30.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3903 [2024-07-31 12:34:04,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3424.62 | bwd_microstep: 4969.40 | bwd_inner_microstep: 4950.02 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2057 [2024-07-31 12:34:13,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.49 | bwd_microstep: 5378.44 | bwd_inner_microstep: 4961.91 | bwd_allreduce_microstep: 416.46 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2194 [2024-07-31 12:34:22,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.79 | bwd_microstep: 5257.64 | bwd_inner_microstep: 4849.38 | bwd_allreduce_microstep: 408.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-31 12:34:30,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3287.73 | bwd_microstep: 5074.30 | bwd_inner_microstep: 4681.61 | bwd_allreduce_microstep: 392.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 12:34:38,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.35 | bwd_microstep: 4767.21 | bwd_inner_microstep: 4747.90 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 12:34:47,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.77 | bwd_microstep: 5056.47 | bwd_inner_microstep: 4995.84 | bwd_allreduce_microstep: 60.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 12:34:55,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 12:34:55,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.63 | bwd_microstep: 5019.01 | bwd_inner_microstep: 4961.78 | bwd_allreduce_microstep: 57.16 | step_microstep: 181.18 [2024-07-31 12:34:55,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28036.69 | bwd: 40735.04 | bwd_inner: 39330.00 | bwd_allreduce: 1404.56 | step: 181.77 43%|████▎ | 529/1230 [10:23:01<13:30:18, 69.36s/it] {'loss': 1.1137, 'learning_rate': 1.2717255676717106e-05, 'epoch': 0.43} 43%|████▎ | 529/1230 [10:23:01<13:30:18, 69.36s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2282 [2024-07-31 12:35:05,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.53 | bwd_microstep: 5528.73 | bwd_inner_microstep: 5106.06 | bwd_allreduce_microstep: 422.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-31 12:35:13,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.51 | bwd_microstep: 5045.11 | bwd_inner_microstep: 5022.42 | bwd_allreduce_microstep: 22.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3813 [2024-07-31 12:35:22,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.50 | bwd_microstep: 5112.34 | bwd_inner_microstep: 5070.60 | bwd_allreduce_microstep: 41.67 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-31 12:35:31,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.74 | bwd_microstep: 5208.28 | bwd_inner_microstep: 5147.17 | bwd_allreduce_microstep: 61.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 12:35:39,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3301.97 | bwd_microstep: 4750.92 | bwd_inner_microstep: 4724.45 | bwd_allreduce_microstep: 26.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 12:35:48,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.07 | bwd_microstep: 5174.65 | bwd_inner_microstep: 4771.90 | bwd_allreduce_microstep: 402.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-31 12:35:56,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.83 | bwd_microstep: 4932.89 | bwd_inner_microstep: 4902.85 | bwd_allreduce_microstep: 29.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 12:36:05,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 12:36:05,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.75 | bwd_microstep: 5141.68 | bwd_inner_microstep: 5072.96 | bwd_allreduce_microstep: 68.65 | step_microstep: 181.56 [2024-07-31 12:36:05,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28606.80 | bwd: 40894.58 | bwd_inner: 39818.35 | bwd_allreduce: 1075.73 | step: 182.16 43%|████▎ | 530/1230 [10:24:11<13:30:49, 69.50s/it] {'loss': 1.2006, 'learning_rate': 1.2691903539039561e-05, 'epoch': 0.43} 43%|████▎ | 530/1230 [10:24:11<13:30:49, 69.50s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-31 12:36:15,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.29 | bwd_microstep: 5669.00 | bwd_inner_microstep: 5234.95 | bwd_allreduce_microstep: 433.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3820 [2024-07-31 12:36:23,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.28 | bwd_microstep: 5088.70 | bwd_inner_microstep: 5063.01 | bwd_allreduce_microstep: 25.62 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 12:36:32,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.31 | bwd_microstep: 4973.15 | bwd_inner_microstep: 4953.69 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2078 [2024-07-31 12:36:41,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.98 | bwd_microstep: 5225.81 | bwd_inner_microstep: 4820.39 | bwd_allreduce_microstep: 405.36 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 12:36:50,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.32 | bwd_microstep: 5085.65 | bwd_inner_microstep: 5021.68 | bwd_allreduce_microstep: 63.90 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3753 [2024-07-31 12:36:58,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.70 | bwd_microstep: 5011.95 | bwd_inner_microstep: 4969.59 | bwd_allreduce_microstep: 42.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-31 12:37:07,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.81 | bwd_microstep: 5132.91 | bwd_inner_microstep: 4733.68 | bwd_allreduce_microstep: 399.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 12:37:16,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 12:37:16,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.54 | bwd_microstep: 5060.58 | bwd_inner_microstep: 5002.92 | bwd_allreduce_microstep: 57.58 | step_microstep: 182.13 [2024-07-31 12:37:16,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28957.13 | bwd: 41247.73 | bwd_inner: 39799.85 | bwd_allreduce: 1447.40 | step: 182.83 43%|████▎ | 531/1230 [10:25:22<13:33:18, 69.81s/it] {'loss': 1.1825, 'learning_rate': 1.2666532734208437e-05, 'epoch': 0.43} 43%|████▎ | 531/1230 [10:25:22<13:33:18, 69.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2257 [2024-07-31 12:37:25,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.06 | bwd_microstep: 5408.85 | bwd_inner_microstep: 4997.50 | bwd_allreduce_microstep: 411.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3998 [2024-07-31 12:37:34,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3833.15 | bwd_microstep: 5239.00 | bwd_inner_microstep: 5219.61 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-31 12:37:43,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.57 | bwd_microstep: 5375.74 | bwd_inner_microstep: 5270.22 | bwd_allreduce_microstep: 105.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-07-31 12:37:52,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3806.71 | bwd_microstep: 5164.66 | bwd_inner_microstep: 5145.35 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-31 12:38:01,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.72 | bwd_microstep: 5038.27 | bwd_inner_microstep: 5018.98 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-31 12:38:09,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.07 | bwd_microstep: 5148.14 | bwd_inner_microstep: 5093.48 | bwd_allreduce_microstep: 54.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-31 12:38:18,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.61 | bwd_microstep: 5234.83 | bwd_inner_microstep: 4829.80 | bwd_allreduce_microstep: 404.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 12:38:27,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 12:38:27,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3395.25 | bwd_microstep: 4989.06 | bwd_inner_microstep: 4940.98 | bwd_allreduce_microstep: 48.01 | step_microstep: 181.84 [2024-07-31 12:38:27,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29166.05 | bwd: 41598.52 | bwd_inner: 40515.85 | bwd_allreduce: 1082.19 | step: 182.42 43%|████▎ | 532/1230 [10:26:33<13:36:38, 70.20s/it] {'loss': 1.2314, 'learning_rate': 1.264114343815898e-05, 'epoch': 0.43} 43%|████▎ | 532/1230 [10:26:33<13:36:38, 70.20s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3960 [2024-07-31 12:38:36,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.97 | bwd_microstep: 5326.36 | bwd_inner_microstep: 5274.93 | bwd_allreduce_microstep: 51.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3567 [2024-07-31 12:38:45,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.08 | bwd_microstep: 5179.11 | bwd_inner_microstep: 5091.53 | bwd_allreduce_microstep: 87.51 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3844 [2024-07-31 12:38:53,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.49 | bwd_microstep: 5016.06 | bwd_inner_microstep: 4996.69 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-31 12:39:01,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3178.66 | bwd_microstep: 4687.07 | bwd_inner_microstep: 4663.91 | bwd_allreduce_microstep: 23.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 12:39:10,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.63 | bwd_microstep: 4976.50 | bwd_inner_microstep: 4957.14 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 12:39:19,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.03 | bwd_microstep: 5000.53 | bwd_inner_microstep: 4948.30 | bwd_allreduce_microstep: 52.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 12:39:27,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.65 | bwd_microstep: 5128.05 | bwd_inner_microstep: 5059.12 | bwd_allreduce_microstep: 68.87 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 12:39:36,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 12:39:36,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.67 | bwd_microstep: 5180.97 | bwd_inner_microstep: 5104.52 | bwd_allreduce_microstep: 76.39 | step_microstep: 183.10 [2024-07-31 12:39:36,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28600.09 | bwd: 40494.64 | bwd_inner: 40096.10 | bwd_allreduce: 398.06 | step: 183.68 43%|████▎ | 533/1230 [10:27:42<13:32:47, 69.97s/it] {'loss': 1.189, 'learning_rate': 1.2615735826954664e-05, 'epoch': 0.43} 43%|████▎ | 533/1230 [10:27:42<13:32:47, 69.97s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 12:39:45,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.94 | bwd_microstep: 5237.65 | bwd_inner_microstep: 5156.80 | bwd_allreduce_microstep: 80.78 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3568 [2024-07-31 12:39:54,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.78 | bwd_microstep: 5132.26 | bwd_inner_microstep: 5051.02 | bwd_allreduce_microstep: 81.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-31 12:40:03,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.85 | bwd_microstep: 5145.22 | bwd_inner_microstep: 5070.07 | bwd_allreduce_microstep: 75.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-31 12:40:11,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3234.32 | bwd_microstep: 4807.37 | bwd_inner_microstep: 4787.92 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 12:40:20,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.03 | bwd_microstep: 5248.35 | bwd_inner_microstep: 5166.72 | bwd_allreduce_microstep: 81.57 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 12:40:28,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3060.79 | bwd_microstep: 4997.83 | bwd_inner_microstep: 4611.77 | bwd_allreduce_microstep: 385.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 12:40:36,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.46 | bwd_microstep: 5109.05 | bwd_inner_microstep: 4713.74 | bwd_allreduce_microstep: 395.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2137 [2024-07-31 12:40:45,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 12:40:45,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.07 | bwd_microstep: 5075.96 | bwd_inner_microstep: 4681.93 | bwd_allreduce_microstep: 393.96 | step_microstep: 181.79 [2024-07-31 12:40:45,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27770.13 | bwd: 40753.68 | bwd_inner: 39239.92 | bwd_allreduce: 1513.25 | step: 182.39 43%|████▎ | 534/1230 [10:28:51<13:27:45, 69.63s/it] {'loss': 1.1816, 'learning_rate': 1.2590310076785972e-05, 'epoch': 0.43} 43%|████▎ | 534/1230 [10:28:51<13:27:45, 69.63s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2404 [2024-07-31 12:40:54,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.98 | bwd_microstep: 5382.04 | bwd_inner_microstep: 4971.68 | bwd_allreduce_microstep: 410.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3906 [2024-07-31 12:41:03,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3798.45 | bwd_microstep: 5143.13 | bwd_inner_microstep: 5123.79 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-07-31 12:41:12,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.31 | bwd_microstep: 5303.99 | bwd_inner_microstep: 4891.06 | bwd_allreduce_microstep: 412.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2253 [2024-07-31 12:41:21,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.64 | bwd_microstep: 5308.99 | bwd_inner_microstep: 4898.36 | bwd_allreduce_microstep: 410.56 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3686 [2024-07-31 12:41:30,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.96 | bwd_microstep: 5298.34 | bwd_inner_microstep: 5228.33 | bwd_allreduce_microstep: 69.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3885 [2024-07-31 12:41:39,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3793.88 | bwd_microstep: 5115.87 | bwd_inner_microstep: 5096.50 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-31 12:41:48,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.87 | bwd_microstep: 4984.28 | bwd_inner_microstep: 4964.89 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1147 [2024-07-31 12:41:56,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 12:41:56,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3440.92 | bwd_microstep: 5074.63 | bwd_inner_microstep: 4683.26 | bwd_allreduce_microstep: 391.30 | step_microstep: 182.15 [2024-07-31 12:41:56,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29169.87 | bwd: 41611.26 | bwd_inner: 39857.81 | bwd_allreduce: 1752.95 | step: 182.74 43%|████▎ | 535/1230 [10:30:02<13:31:45, 70.08s/it] {'loss': 1.1777, 'learning_rate': 1.256486636396917e-05, 'epoch': 0.43} 43%|████▎ | 535/1230 [10:30:02<13:31:45, 70.08s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3949 [2024-07-31 12:42:05,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.02 | bwd_microstep: 5312.62 | bwd_inner_microstep: 5258.21 | bwd_allreduce_microstep: 54.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3778 [2024-07-31 12:42:14,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.47 | bwd_microstep: 5070.09 | bwd_inner_microstep: 5044.16 | bwd_allreduce_microstep: 25.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-31 12:42:23,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.46 | bwd_microstep: 5312.35 | bwd_inner_microstep: 5221.06 | bwd_allreduce_microstep: 91.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3854 [2024-07-31 12:42:32,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.42 | bwd_microstep: 5114.09 | bwd_inner_microstep: 5071.76 | bwd_allreduce_microstep: 42.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2224 [2024-07-31 12:42:41,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.96 | bwd_microstep: 5186.92 | bwd_inner_microstep: 4780.02 | bwd_allreduce_microstep: 406.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 12:42:49,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.03 | bwd_microstep: 5048.50 | bwd_inner_microstep: 4657.83 | bwd_allreduce_microstep: 390.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-31 12:42:58,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.22 | bwd_microstep: 5094.25 | bwd_inner_microstep: 4700.09 | bwd_allreduce_microstep: 394.09 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 12:43:07,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 12:43:07,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.30 | bwd_microstep: 4983.74 | bwd_inner_microstep: 4938.43 | bwd_allreduce_microstep: 45.21 | step_microstep: 181.36 [2024-07-31 12:43:07,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28883.78 | bwd: 41122.52 | bwd_inner: 39671.48 | bwd_allreduce: 1450.52 | step: 181.95 44%|████▎ | 536/1230 [10:31:12<13:31:28, 70.16s/it] {'loss': 1.141, 'learning_rate': 1.2539404864945087e-05, 'epoch': 0.44} 44%|████▎ | 536/1230 [10:31:12<13:31:28, 70.16s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3877 [2024-07-31 12:43:16,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3789.74 | bwd_microstep: 5282.50 | bwd_inner_microstep: 5249.11 | bwd_allreduce_microstep: 33.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-31 12:43:25,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.15 | bwd_microstep: 5192.43 | bwd_inner_microstep: 5130.19 | bwd_allreduce_microstep: 62.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 12:43:33,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.39 | bwd_microstep: 5077.15 | bwd_inner_microstep: 5029.76 | bwd_allreduce_microstep: 47.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 12:43:42,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.14 | bwd_microstep: 4988.81 | bwd_inner_microstep: 4969.40 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 12:43:51,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.52 | bwd_microstep: 5031.36 | bwd_inner_microstep: 4989.61 | bwd_allreduce_microstep: 41.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 12:43:59,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.41 | bwd_microstep: 4805.16 | bwd_inner_microstep: 4785.83 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 12:44:08,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.13 | bwd_microstep: 5031.30 | bwd_inner_microstep: 5007.40 | bwd_allreduce_microstep: 23.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-31 12:44:17,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 12:44:17,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.25 | bwd_microstep: 5244.68 | bwd_inner_microstep: 4837.62 | bwd_allreduce_microstep: 406.99 | step_microstep: 182.30 [2024-07-31 12:44:17,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29029.63 | bwd: 40653.36 | bwd_inner: 39998.86 | bwd_allreduce: 654.02 | step: 182.91 44%|████▎ | 537/1230 [10:32:22<13:29:50, 70.12s/it] {'loss': 1.1549, 'learning_rate': 1.2513925756277894e-05, 'epoch': 0.44} 44%|████▎ | 537/1230 [10:32:22<13:29:50, 70.12s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3953 [2024-07-31 12:44:26,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.42 | bwd_microstep: 5204.53 | bwd_inner_microstep: 5172.35 | bwd_allreduce_microstep: 32.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3833 [2024-07-31 12:44:34,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.19 | bwd_microstep: 4841.93 | bwd_inner_microstep: 4822.65 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2069 [2024-07-31 12:44:42,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.97 | bwd_microstep: 5254.24 | bwd_inner_microstep: 4847.16 | bwd_allreduce_microstep: 407.02 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3730 [2024-07-31 12:44:51,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.23 | bwd_microstep: 5130.49 | bwd_inner_microstep: 5060.96 | bwd_allreduce_microstep: 69.46 | step_microstep: 0.09 dynamic ViT batch size: 15, images per sample: 7.5, dynamic token length: 2879 [2024-07-31 12:45:00,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.61 | bwd_microstep: 5086.37 | bwd_inner_microstep: 4691.06 | bwd_allreduce_microstep: 395.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 12:45:08,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.11 | bwd_microstep: 4885.44 | bwd_inner_microstep: 4866.02 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2139 [2024-07-31 12:45:17,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.80 | bwd_microstep: 5087.12 | bwd_inner_microstep: 4692.71 | bwd_allreduce_microstep: 394.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 12:45:26,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-31 12:45:26,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.01 | bwd_microstep: 5077.91 | bwd_inner_microstep: 5019.13 | bwd_allreduce_microstep: 58.71 | step_microstep: 184.49 [2024-07-31 12:45:26,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28405.25 | bwd: 40568.01 | bwd_inner: 39171.99 | bwd_allreduce: 1395.53 | step: 185.19 44%|████▎ | 538/1230 [10:33:32<13:25:52, 69.87s/it] {'loss': 1.2417, 'learning_rate': 1.2488429214653871e-05, 'epoch': 0.44} 44%|████▎ | 538/1230 [10:33:32<13:25:52, 69.87s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3521 [2024-07-31 12:45:35,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.57 | bwd_microstep: 5296.46 | bwd_inner_microstep: 5185.67 | bwd_allreduce_microstep: 110.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2265 [2024-07-31 12:45:43,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3075.90 | bwd_microstep: 5115.98 | bwd_inner_microstep: 4722.85 | bwd_allreduce_microstep: 393.06 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 12:45:52,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.97 | bwd_microstep: 5180.93 | bwd_inner_microstep: 5119.52 | bwd_allreduce_microstep: 61.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 12:46:01,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.71 | bwd_microstep: 5178.22 | bwd_inner_microstep: 5098.99 | bwd_allreduce_microstep: 79.16 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2193 [2024-07-31 12:46:10,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.80 | bwd_microstep: 5249.24 | bwd_inner_microstep: 4843.40 | bwd_allreduce_microstep: 405.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 12:46:18,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3207.75 | bwd_microstep: 4728.23 | bwd_inner_microstep: 4699.98 | bwd_allreduce_microstep: 28.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-31 12:46:26,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.39 | bwd_microstep: 4907.97 | bwd_inner_microstep: 4882.73 | bwd_allreduce_microstep: 25.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 12:46:35,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 12:46:35,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.82 | bwd_microstep: 5043.94 | bwd_inner_microstep: 4978.90 | bwd_allreduce_microstep: 64.97 | step_microstep: 182.06 [2024-07-31 12:46:35,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27993.84 | bwd: 40700.94 | bwd_inner: 39531.98 | bwd_allreduce: 1168.47 | step: 182.65 44%|████▍ | 539/1230 [10:34:41<13:21:47, 69.62s/it] {'loss': 1.1379, 'learning_rate': 1.2462915416880199e-05, 'epoch': 0.44} 44%|████▍ | 539/1230 [10:34:41<13:21:47, 69.62s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 12:46:44,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.22 | bwd_microstep: 5574.59 | bwd_inner_microstep: 5518.05 | bwd_allreduce_microstep: 56.47 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2800 [2024-07-31 12:46:53,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.79 | bwd_microstep: 5317.90 | bwd_inner_microstep: 4904.42 | bwd_allreduce_microstep: 413.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-31 12:47:02,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.99 | bwd_microstep: 5004.93 | bwd_inner_microstep: 4970.00 | bwd_allreduce_microstep: 34.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 12:47:11,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.16 | bwd_microstep: 5144.10 | bwd_inner_microstep: 5071.43 | bwd_allreduce_microstep: 72.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-31 12:47:19,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.14 | bwd_microstep: 5003.48 | bwd_inner_microstep: 4984.14 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2119 [2024-07-31 12:47:28,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.87 | bwd_microstep: 5114.46 | bwd_inner_microstep: 4718.64 | bwd_allreduce_microstep: 395.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 12:47:37,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.96 | bwd_microstep: 5055.27 | bwd_inner_microstep: 4996.97 | bwd_allreduce_microstep: 58.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-07-31 12:47:46,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 12:47:46,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.71 | bwd_microstep: 5009.79 | bwd_inner_microstep: 4973.50 | bwd_allreduce_microstep: 36.23 | step_microstep: 181.80 [2024-07-31 12:47:46,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29107.75 | bwd: 41224.50 | bwd_inner: 40137.08 | bwd_allreduce: 1086.93 | step: 182.38 44%|████▍ | 540/1230 [10:35:51<13:24:14, 69.93s/it] {'loss': 1.1966, 'learning_rate': 1.2437384539883715e-05, 'epoch': 0.44} 44%|████▍ | 540/1230 [10:35:51<13:24:14, 69.93s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2466 [2024-07-31 12:47:55,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.46 | bwd_microstep: 5563.19 | bwd_inner_microstep: 5137.71 | bwd_allreduce_microstep: 425.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3831 [2024-07-31 12:48:04,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.58 | bwd_microstep: 5040.64 | bwd_inner_microstep: 5021.34 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-07-31 12:48:12,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.94 | bwd_microstep: 5213.25 | bwd_inner_microstep: 4806.84 | bwd_allreduce_microstep: 406.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-31 12:48:21,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.26 | bwd_microstep: 5238.98 | bwd_inner_microstep: 4831.71 | bwd_allreduce_microstep: 407.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-31 12:48:30,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.84 | bwd_microstep: 5096.71 | bwd_inner_microstep: 4702.05 | bwd_allreduce_microstep: 394.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-31 12:48:38,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.50 | bwd_microstep: 4777.72 | bwd_inner_microstep: 4758.35 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 12:48:47,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.44 | bwd_microstep: 5046.24 | bwd_inner_microstep: 5018.51 | bwd_allreduce_microstep: 27.66 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2136 [2024-07-31 12:48:56,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 12:48:56,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.10 | bwd_microstep: 5113.53 | bwd_inner_microstep: 4714.45 | bwd_allreduce_microstep: 399.01 | step_microstep: 181.91 [2024-07-31 12:48:56,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28548.03 | bwd: 41090.25 | bwd_inner: 38990.89 | bwd_allreduce: 2098.86 | step: 182.50 44%|████▍ | 541/1230 [10:37:01<13:23:11, 69.94s/it] {'loss': 1.1681, 'learning_rate': 1.2411836760709686e-05, 'epoch': 0.44} 44%|████▍ | 541/1230 [10:37:01<13:23:11, 69.94s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 4083 [2024-07-31 12:49:05,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.95 | bwd_microstep: 5296.24 | bwd_inner_microstep: 5261.72 | bwd_allreduce_microstep: 34.45 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 12:49:13,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.47 | bwd_microstep: 5221.44 | bwd_inner_microstep: 5137.43 | bwd_allreduce_microstep: 83.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3750 [2024-07-31 12:49:22,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.03 | bwd_microstep: 5189.34 | bwd_inner_microstep: 5131.22 | bwd_allreduce_microstep: 58.05 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3779 [2024-07-31 12:49:31,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.30 | bwd_microstep: 5197.13 | bwd_inner_microstep: 5140.64 | bwd_allreduce_microstep: 56.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3775 [2024-07-31 12:49:40,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.28 | bwd_microstep: 5169.28 | bwd_inner_microstep: 5116.18 | bwd_allreduce_microstep: 53.04 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 12:49:49,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.74 | bwd_microstep: 5126.00 | bwd_inner_microstep: 5056.59 | bwd_allreduce_microstep: 69.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3811 [2024-07-31 12:49:57,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.40 | bwd_microstep: 5010.19 | bwd_inner_microstep: 4961.70 | bwd_allreduce_microstep: 48.42 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2176 [2024-07-31 12:50:06,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 12:50:06,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.57 | bwd_microstep: 5242.14 | bwd_inner_microstep: 4835.21 | bwd_allreduce_microstep: 406.85 | step_microstep: 181.91 [2024-07-31 12:50:06,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28867.66 | bwd: 41451.72 | bwd_inner: 40640.62 | bwd_allreduce: 810.62 | step: 182.61 44%|████▍ | 542/1230 [10:38:12<13:24:27, 70.16s/it] {'loss': 1.1873, 'learning_rate': 1.2386272256520606e-05, 'epoch': 0.44} 44%|████▍ | 542/1230 [10:38:12<13:24:27, 70.16s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3855 [2024-07-31 12:50:16,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.61 | bwd_microstep: 5624.59 | bwd_inner_microstep: 5517.20 | bwd_allreduce_microstep: 107.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 12:50:25,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.37 | bwd_microstep: 5308.68 | bwd_inner_microstep: 5212.75 | bwd_allreduce_microstep: 95.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2231 [2024-07-31 12:50:33,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.71 | bwd_microstep: 5199.98 | bwd_inner_microstep: 4795.97 | bwd_allreduce_microstep: 403.93 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2117 [2024-07-31 12:50:42,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3081.53 | bwd_microstep: 5180.98 | bwd_inner_microstep: 4781.81 | bwd_allreduce_microstep: 399.09 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-31 12:50:50,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3025.39 | bwd_microstep: 4950.02 | bwd_inner_microstep: 4567.40 | bwd_allreduce_microstep: 382.55 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-31 12:50:58,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.86 | bwd_microstep: 4789.07 | bwd_inner_microstep: 4769.71 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3769 [2024-07-31 12:51:06,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.29 | bwd_microstep: 5010.42 | bwd_inner_microstep: 4991.06 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3666 [2024-07-31 12:51:15,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 12:51:15,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3086.96 | bwd_microstep: 4871.82 | bwd_inner_microstep: 4824.31 | bwd_allreduce_microstep: 47.43 | step_microstep: 183.00 [2024-07-31 12:51:15,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27126.62 | bwd: 40935.55 | bwd_inner: 39460.15 | bwd_allreduce: 1474.87 | step: 183.83 44%|████▍ | 543/1230 [10:39:21<13:17:15, 69.63s/it] {'loss': 1.1749, 'learning_rate': 1.2360691204594934e-05, 'epoch': 0.44} 44%|████▍ | 543/1230 [10:39:21<13:17:15, 69.63s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4096 [2024-07-31 12:51:24,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.94 | bwd_microstep: 5247.10 | bwd_inner_microstep: 5228.04 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2266 [2024-07-31 12:51:33,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.12 | bwd_microstep: 5308.51 | bwd_inner_microstep: 4901.38 | bwd_allreduce_microstep: 407.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3823 [2024-07-31 12:51:41,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.16 | bwd_microstep: 5045.85 | bwd_inner_microstep: 5026.47 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-07-31 12:51:50,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.97 | bwd_microstep: 5047.73 | bwd_inner_microstep: 5021.97 | bwd_allreduce_microstep: 25.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-31 12:51:59,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.53 | bwd_microstep: 5181.90 | bwd_inner_microstep: 5126.47 | bwd_allreduce_microstep: 55.36 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2135 [2024-07-31 12:52:08,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.46 | bwd_microstep: 5069.56 | bwd_inner_microstep: 4676.03 | bwd_allreduce_microstep: 393.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 12:52:15,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3192.48 | bwd_microstep: 4726.50 | bwd_inner_microstep: 4695.30 | bwd_allreduce_microstep: 31.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 12:52:24,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 12:52:24,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.85 | bwd_microstep: 5158.35 | bwd_inner_microstep: 5083.99 | bwd_allreduce_microstep: 74.29 | step_microstep: 181.83 [2024-07-31 12:52:24,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28686.42 | bwd: 40785.48 | bwd_inner: 39759.60 | bwd_allreduce: 1025.38 | step: 182.55 44%|████▍ | 544/1230 [10:40:30<13:16:41, 69.68s/it] {'loss': 1.2204, 'learning_rate': 1.2335093782325889e-05, 'epoch': 0.44} 44%|████▍ | 544/1230 [10:40:30<13:16:41, 69.68s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3820 [2024-07-31 12:52:33,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3807.36 | bwd_microstep: 5185.82 | bwd_inner_microstep: 5158.42 | bwd_allreduce_microstep: 27.33 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3585 [2024-07-31 12:52:42,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3128.03 | bwd_microstep: 5031.57 | bwd_inner_microstep: 4964.27 | bwd_allreduce_microstep: 67.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 12:52:50,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.62 | bwd_microstep: 5021.56 | bwd_inner_microstep: 4996.50 | bwd_allreduce_microstep: 25.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3616 [2024-07-31 12:52:59,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.31 | bwd_microstep: 5129.22 | bwd_inner_microstep: 5039.79 | bwd_allreduce_microstep: 89.36 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 12:53:08,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.31 | bwd_microstep: 5001.95 | bwd_inner_microstep: 4951.34 | bwd_allreduce_microstep: 50.53 | step_microstep: 0.20 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 12:53:16,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.21 | bwd_microstep: 5065.50 | bwd_inner_microstep: 5006.07 | bwd_allreduce_microstep: 59.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 12:53:24,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.78 | bwd_microstep: 4783.33 | bwd_inner_microstep: 4749.04 | bwd_allreduce_microstep: 34.22 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3734 [2024-07-31 12:53:33,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 12:53:33,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.54 | bwd_microstep: 5028.52 | bwd_inner_microstep: 4989.92 | bwd_allreduce_microstep: 38.54 | step_microstep: 183.08 [2024-07-31 12:53:33,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28209.07 | bwd: 40247.45 | bwd_inner: 39855.28 | bwd_allreduce: 391.69 | step: 183.80 44%|████▍ | 545/1230 [10:41:39<13:12:29, 69.41s/it] {'loss': 1.1876, 'learning_rate': 1.2309480167220203e-05, 'epoch': 0.44} 44%|████▍ | 545/1230 [10:41:39<13:12:29, 69.41s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4036 [2024-07-31 12:53:42,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3859.72 | bwd_microstep: 5370.69 | bwd_inner_microstep: 5351.72 | bwd_allreduce_microstep: 18.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-07-31 12:53:51,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.23 | bwd_microstep: 5100.75 | bwd_inner_microstep: 5031.73 | bwd_allreduce_microstep: 68.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-07-31 12:54:00,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.49 | bwd_microstep: 5229.08 | bwd_inner_microstep: 5171.66 | bwd_allreduce_microstep: 57.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 12:54:09,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.24 | bwd_microstep: 5138.93 | bwd_inner_microstep: 5064.90 | bwd_allreduce_microstep: 73.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 12:54:18,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.77 | bwd_microstep: 4987.19 | bwd_inner_microstep: 4967.85 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 12:54:26,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.60 | bwd_microstep: 5207.00 | bwd_inner_microstep: 4801.29 | bwd_allreduce_microstep: 405.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-31 12:54:34,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.68 | bwd_microstep: 4793.06 | bwd_inner_microstep: 4759.96 | bwd_allreduce_microstep: 33.04 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2142 [2024-07-31 12:54:43,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 12:54:43,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3449.99 | bwd_microstep: 5025.46 | bwd_inner_microstep: 4634.24 | bwd_allreduce_microstep: 391.14 | step_microstep: 181.77 [2024-07-31 12:54:43,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28566.63 | bwd: 40852.14 | bwd_inner: 39783.29 | bwd_allreduce: 1068.36 | step: 182.35 44%|████▍ | 546/1230 [10:42:49<13:12:28, 69.51s/it] {'loss': 1.1788, 'learning_rate': 1.2283850536896907e-05, 'epoch': 0.44} 44%|████▍ | 546/1230 [10:42:49<13:12:28, 69.51s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3999 [2024-07-31 12:54:52,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3848.40 | bwd_microstep: 5251.49 | bwd_inner_microstep: 5232.36 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3672 [2024-07-31 12:55:01,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.52 | bwd_microstep: 5219.76 | bwd_inner_microstep: 5152.27 | bwd_allreduce_microstep: 67.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3792 [2024-07-31 12:55:10,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.40 | bwd_microstep: 5032.93 | bwd_inner_microstep: 5013.66 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-31 12:55:19,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.01 | bwd_microstep: 5040.74 | bwd_inner_microstep: 4980.58 | bwd_allreduce_microstep: 60.09 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3645 [2024-07-31 12:55:27,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.33 | bwd_microstep: 4973.03 | bwd_inner_microstep: 4936.51 | bwd_allreduce_microstep: 36.45 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 12:55:36,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.19 | bwd_microstep: 5133.84 | bwd_inner_microstep: 5063.20 | bwd_allreduce_microstep: 70.57 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3672 [2024-07-31 12:55:44,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.95 | bwd_microstep: 4871.72 | bwd_inner_microstep: 4852.45 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 12:55:53,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 12:55:53,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.79 | bwd_microstep: 5014.57 | bwd_inner_microstep: 4960.49 | bwd_allreduce_microstep: 54.01 | step_microstep: 181.83 [2024-07-31 12:55:53,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29373.51 | bwd: 40538.04 | bwd_inner: 40191.46 | bwd_allreduce: 346.06 | step: 182.54 44%|████▍ | 547/1230 [10:43:59<13:13:48, 69.73s/it] {'loss': 1.2119, 'learning_rate': 1.2258205069086082e-05, 'epoch': 0.44} 44%|████▍ | 547/1230 [10:43:59<13:13:48, 69.73s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2161 [2024-07-31 12:56:02,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.05 | bwd_microstep: 5437.69 | bwd_inner_microstep: 5019.35 | bwd_allreduce_microstep: 418.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2259 [2024-07-31 12:56:11,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.57 | bwd_microstep: 5286.06 | bwd_inner_microstep: 4876.81 | bwd_allreduce_microstep: 409.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 12:56:20,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.13 | bwd_microstep: 4969.60 | bwd_inner_microstep: 4950.24 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 12:56:29,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.47 | bwd_microstep: 5205.30 | bwd_inner_microstep: 5124.47 | bwd_allreduce_microstep: 80.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3811 [2024-07-31 12:56:37,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.93 | bwd_microstep: 5038.53 | bwd_inner_microstep: 5019.14 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-31 12:56:46,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.62 | bwd_microstep: 4988.08 | bwd_inner_microstep: 4968.73 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3840 [2024-07-31 12:56:55,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.25 | bwd_microstep: 4898.21 | bwd_inner_microstep: 4878.90 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 12:57:03,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 12:57:03,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.75 | bwd_microstep: 4994.02 | bwd_inner_microstep: 4937.34 | bwd_allreduce_microstep: 56.61 | step_microstep: 181.32 [2024-07-31 12:57:03,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29116.68 | bwd: 40817.47 | bwd_inner: 39774.92 | bwd_allreduce: 1042.06 | step: 181.90 45%|████▍ | 548/1230 [10:45:09<13:14:26, 69.89s/it] {'loss': 1.1518, 'learning_rate': 1.2232543941627643e-05, 'epoch': 0.45} 45%|████▍ | 548/1230 [10:45:09<13:14:26, 69.89s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2544 [2024-07-31 12:57:13,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.34 | bwd_microstep: 5455.88 | bwd_inner_microstep: 5037.06 | bwd_allreduce_microstep: 418.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3794 [2024-07-31 12:57:21,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3358.52 | bwd_microstep: 5096.78 | bwd_inner_microstep: 5058.29 | bwd_allreduce_microstep: 38.42 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2099 [2024-07-31 12:57:29,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3294.17 | bwd_microstep: 4972.73 | bwd_inner_microstep: 4588.33 | bwd_allreduce_microstep: 384.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-31 12:57:38,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.00 | bwd_microstep: 4999.40 | bwd_inner_microstep: 4980.08 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-31 12:57:47,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.59 | bwd_microstep: 5262.83 | bwd_inner_microstep: 4855.21 | bwd_allreduce_microstep: 407.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3697 [2024-07-31 12:57:55,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3121.60 | bwd_microstep: 4997.08 | bwd_inner_microstep: 4937.04 | bwd_allreduce_microstep: 59.96 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 12:58:04,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.78 | bwd_microstep: 4997.28 | bwd_inner_microstep: 4959.59 | bwd_allreduce_microstep: 37.62 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2960 [2024-07-31 12:58:13,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 12:58:13,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.87 | bwd_microstep: 5021.31 | bwd_inner_microstep: 4764.52 | bwd_allreduce_microstep: 256.72 | step_microstep: 181.55 [2024-07-31 12:58:13,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27934.77 | bwd: 40803.28 | bwd_inner: 39180.07 | bwd_allreduce: 1622.70 | step: 182.15 45%|████▍ | 549/1230 [10:46:18<13:10:29, 69.65s/it] {'loss': 1.1658, 'learning_rate': 1.2206867332470091e-05, 'epoch': 0.45} 45%|████▍ | 549/1230 [10:46:18<13:10:29, 69.65s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3904 [2024-07-31 12:58:22,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.76 | bwd_microstep: 5611.83 | bwd_inner_microstep: 5508.45 | bwd_allreduce_microstep: 103.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 12:58:31,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.40 | bwd_microstep: 5097.52 | bwd_inner_microstep: 5057.09 | bwd_allreduce_microstep: 40.36 | step_microstep: 0.18 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2066 [2024-07-31 12:58:39,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.24 | bwd_microstep: 5249.19 | bwd_inner_microstep: 4841.20 | bwd_allreduce_microstep: 407.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3770 [2024-07-31 12:58:48,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.17 | bwd_microstep: 5003.80 | bwd_inner_microstep: 4984.40 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-31 12:58:56,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3177.82 | bwd_microstep: 4686.79 | bwd_inner_microstep: 4665.51 | bwd_allreduce_microstep: 21.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 12:59:05,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.97 | bwd_microstep: 5025.71 | bwd_inner_microstep: 4969.43 | bwd_allreduce_microstep: 56.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 12:59:14,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.59 | bwd_microstep: 5205.49 | bwd_inner_microstep: 5079.32 | bwd_allreduce_microstep: 126.11 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 12:59:22,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 12:59:22,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.72 | bwd_microstep: 4914.61 | bwd_inner_microstep: 4894.55 | bwd_allreduce_microstep: 19.99 | step_microstep: 182.06 [2024-07-31 12:59:22,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28731.57 | bwd: 40794.93 | bwd_inner: 39999.87 | bwd_allreduce: 794.57 | step: 182.76 45%|████▍ | 550/1230 [10:47:28<13:10:03, 69.71s/it] {'loss': 1.1896, 'learning_rate': 1.2181175419669292e-05, 'epoch': 0.45} 45%|████▍ | 550/1230 [10:47:28<13:10:03, 69.71s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3852 [2024-07-31 12:59:31,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3819.54 | bwd_microstep: 5205.86 | bwd_inner_microstep: 5175.71 | bwd_allreduce_microstep: 30.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2230 [2024-07-31 12:59:40,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.87 | bwd_microstep: 5294.71 | bwd_inner_microstep: 4884.54 | bwd_allreduce_microstep: 410.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-31 12:59:49,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.15 | bwd_microstep: 5069.68 | bwd_inner_microstep: 5040.61 | bwd_allreduce_microstep: 29.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 12:59:58,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.28 | bwd_microstep: 4987.54 | bwd_inner_microstep: 4968.19 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3698 [2024-07-31 13:00:07,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.31 | bwd_microstep: 5134.64 | bwd_inner_microstep: 5079.84 | bwd_allreduce_microstep: 54.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 13:00:15,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.03 | bwd_microstep: 4930.39 | bwd_inner_microstep: 4903.42 | bwd_allreduce_microstep: 26.90 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3669 [2024-07-31 13:00:24,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.81 | bwd_microstep: 5070.66 | bwd_inner_microstep: 4990.21 | bwd_allreduce_microstep: 80.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 13:00:32,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 13:00:32,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3186.23 | bwd_microstep: 4689.75 | bwd_inner_microstep: 4670.35 | bwd_allreduce_microstep: 19.33 | step_microstep: 182.40 [2024-07-31 13:00:32,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28959.12 | bwd: 40383.22 | bwd_inner: 39712.81 | bwd_allreduce: 669.92 | step: 182.99 45%|████▍ | 551/1230 [10:48:38<13:08:46, 69.70s/it] {'loss': 1.2005, 'learning_rate': 1.215546838138723e-05, 'epoch': 0.45} 45%|████▍ | 551/1230 [10:48:38<13:08:46, 69.70s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4065 [2024-07-31 13:00:41,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.71 | bwd_microstep: 5242.57 | bwd_inner_microstep: 5223.56 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3908 [2024-07-31 13:00:50,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.59 | bwd_microstep: 5295.31 | bwd_inner_microstep: 5255.23 | bwd_allreduce_microstep: 40.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3793 [2024-07-31 13:00:59,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.56 | bwd_microstep: 5245.84 | bwd_inner_microstep: 5186.37 | bwd_allreduce_microstep: 59.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-31 13:01:07,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3053.89 | bwd_microstep: 5055.82 | bwd_inner_microstep: 4667.12 | bwd_allreduce_microstep: 388.63 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3655 [2024-07-31 13:01:15,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3136.59 | bwd_microstep: 4849.38 | bwd_inner_microstep: 4806.22 | bwd_allreduce_microstep: 43.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 13:01:24,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.96 | bwd_microstep: 5070.07 | bwd_inner_microstep: 5010.02 | bwd_allreduce_microstep: 59.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 13:01:32,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.29 | bwd_microstep: 4905.45 | bwd_inner_microstep: 4886.10 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-31 13:01:41,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-31 13:01:41,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.09 | bwd_microstep: 5163.73 | bwd_inner_microstep: 4762.26 | bwd_allreduce_microstep: 401.40 | step_microstep: 182.12 [2024-07-31 13:01:41,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28075.57 | bwd: 40828.15 | bwd_inner: 39796.81 | bwd_allreduce: 1030.86 | step: 182.70 45%|████▍ | 552/1230 [10:49:47<13:06:03, 69.56s/it] {'loss': 1.1362, 'learning_rate': 1.212974639589078e-05, 'epoch': 0.45} 45%|████▍ | 552/1230 [10:49:47<13:06:03, 69.56s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3889 [2024-07-31 13:01:50,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3834.76 | bwd_microstep: 5267.58 | bwd_inner_microstep: 5229.87 | bwd_allreduce_microstep: 37.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3573 [2024-07-31 13:01:59,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.41 | bwd_microstep: 5245.23 | bwd_inner_microstep: 5149.36 | bwd_allreduce_microstep: 95.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-31 13:02:08,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.99 | bwd_microstep: 5190.95 | bwd_inner_microstep: 5138.83 | bwd_allreduce_microstep: 52.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-31 13:02:17,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.65 | bwd_microstep: 5002.72 | bwd_inner_microstep: 4983.35 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 13:02:26,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.79 | bwd_microstep: 4995.60 | bwd_inner_microstep: 4976.31 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-31 13:02:34,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.75 | bwd_microstep: 5054.33 | bwd_inner_microstep: 4662.67 | bwd_allreduce_microstep: 391.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2100 [2024-07-31 13:02:42,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3026.80 | bwd_microstep: 4904.24 | bwd_inner_microstep: 4525.59 | bwd_allreduce_microstep: 378.59 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 13:02:51,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 13:02:51,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.74 | bwd_microstep: 4961.04 | bwd_inner_microstep: 4912.21 | bwd_allreduce_microstep: 48.76 | step_microstep: 181.67 [2024-07-31 13:02:51,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28556.79 | bwd: 40621.68 | bwd_inner: 39578.13 | bwd_allreduce: 1043.06 | step: 182.25 45%|████▍ | 553/1230 [10:50:57<13:04:43, 69.55s/it] {'loss': 1.1699, 'learning_rate': 1.2104009641550472e-05, 'epoch': 0.45} 45%|████▍ | 553/1230 [10:50:57<13:04:43, 69.55s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 13:03:00,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.78 | bwd_microstep: 5195.47 | bwd_inner_microstep: 5176.44 | bwd_allreduce_microstep: 18.96 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3931 [2024-07-31 13:03:09,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.11 | bwd_microstep: 5470.91 | bwd_inner_microstep: 5408.15 | bwd_allreduce_microstep: 62.69 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2239 [2024-07-31 13:03:18,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.08 | bwd_microstep: 5255.43 | bwd_inner_microstep: 4847.96 | bwd_allreduce_microstep: 407.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-31 13:03:27,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.01 | bwd_microstep: 5012.80 | bwd_inner_microstep: 4992.76 | bwd_allreduce_microstep: 19.97 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-31 13:03:35,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.98 | bwd_microstep: 5221.11 | bwd_inner_microstep: 4813.02 | bwd_allreduce_microstep: 408.02 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 13:03:44,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.84 | bwd_microstep: 4886.88 | bwd_inner_microstep: 4867.37 | bwd_allreduce_microstep: 19.44 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2116 [2024-07-31 13:03:53,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.90 | bwd_microstep: 5061.86 | bwd_inner_microstep: 4669.33 | bwd_allreduce_microstep: 392.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2129 [2024-07-31 13:04:01,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 13:04:01,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.85 | bwd_microstep: 5135.93 | bwd_inner_microstep: 4739.96 | bwd_allreduce_microstep: 395.91 | step_microstep: 182.03 [2024-07-31 13:04:01,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29011.46 | bwd: 41240.38 | bwd_inner: 39514.94 | bwd_allreduce: 1724.95 | step: 182.74 45%|████▌ | 554/1230 [10:52:07<13:07:04, 69.86s/it] {'loss': 1.1544, 'learning_rate': 1.2078258296839245e-05, 'epoch': 0.45} 45%|████▌ | 554/1230 [10:52:07<13:07:04, 69.86s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2439 [2024-07-31 13:04:10,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.13 | bwd_microstep: 5329.18 | bwd_inner_microstep: 4919.87 | bwd_allreduce_microstep: 409.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 13:04:19,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3379.06 | bwd_microstep: 5091.26 | bwd_inner_microstep: 5053.91 | bwd_allreduce_microstep: 37.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-31 13:04:28,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.65 | bwd_microstep: 5314.60 | bwd_inner_microstep: 5242.93 | bwd_allreduce_microstep: 71.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-31 13:04:37,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.63 | bwd_microstep: 5206.94 | bwd_inner_microstep: 5144.31 | bwd_allreduce_microstep: 62.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 13:04:45,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.01 | bwd_microstep: 4950.57 | bwd_inner_microstep: 4917.82 | bwd_allreduce_microstep: 32.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-31 13:04:54,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.08 | bwd_microstep: 5083.34 | bwd_inner_microstep: 4690.32 | bwd_allreduce_microstep: 392.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-31 13:05:03,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.09 | bwd_microstep: 4992.64 | bwd_inner_microstep: 4956.65 | bwd_allreduce_microstep: 35.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 13:05:11,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 13:05:11,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.68 | bwd_microstep: 5004.97 | bwd_inner_microstep: 4954.27 | bwd_allreduce_microstep: 50.63 | step_microstep: 182.85 [2024-07-31 13:05:11,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28575.23 | bwd: 40973.48 | bwd_inner: 39880.02 | bwd_allreduce: 1092.97 | step: 183.43 45%|████▌ | 555/1230 [10:53:17<13:05:59, 69.87s/it] {'loss': 1.1455, 'learning_rate': 1.205249254033122e-05, 'epoch': 0.45} 45%|████▌ | 555/1230 [10:53:17<13:05:59, 69.87s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3947 [2024-07-31 13:05:20,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3801.45 | bwd_microstep: 5174.92 | bwd_inner_microstep: 5155.85 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3865 [2024-07-31 13:05:29,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.92 | bwd_microstep: 5277.44 | bwd_inner_microstep: 5220.32 | bwd_allreduce_microstep: 57.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-07-31 13:05:38,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3813.55 | bwd_microstep: 5152.04 | bwd_inner_microstep: 5132.66 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 13:05:47,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.39 | bwd_microstep: 5185.21 | bwd_inner_microstep: 5124.90 | bwd_allreduce_microstep: 60.24 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2207 [2024-07-31 13:05:56,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.44 | bwd_microstep: 5223.18 | bwd_inner_microstep: 4815.65 | bwd_allreduce_microstep: 407.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 13:06:05,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.79 | bwd_microstep: 5003.46 | bwd_inner_microstep: 4984.15 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 13:06:13,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.85 | bwd_microstep: 4999.70 | bwd_inner_microstep: 4943.70 | bwd_allreduce_microstep: 55.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 13:06:22,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 13:06:22,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.69 | bwd_microstep: 4991.89 | bwd_inner_microstep: 4939.46 | bwd_allreduce_microstep: 52.36 | step_microstep: 181.97 [2024-07-31 13:06:22,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29239.99 | bwd: 41007.81 | bwd_inner: 40316.63 | bwd_allreduce: 690.69 | step: 182.55 45%|████▌ | 556/1230 [10:54:28<13:07:13, 70.08s/it] {'loss': 1.1757, 'learning_rate': 1.2026712550700457e-05, 'epoch': 0.45} 45%|████▌ | 556/1230 [10:54:28<13:07:13, 70.08s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3950 [2024-07-31 13:06:31,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3831.99 | bwd_microstep: 5233.89 | bwd_inner_microstep: 5214.85 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2048 [2024-07-31 13:06:40,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.91 | bwd_microstep: 5269.49 | bwd_inner_microstep: 4863.55 | bwd_allreduce_microstep: 405.88 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2229 [2024-07-31 13:06:48,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3028.97 | bwd_microstep: 5006.52 | bwd_inner_microstep: 4618.92 | bwd_allreduce_microstep: 387.53 | step_microstep: 0.10 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2098 [2024-07-31 13:06:57,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.90 | bwd_microstep: 5161.03 | bwd_inner_microstep: 4760.40 | bwd_allreduce_microstep: 400.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-31 13:07:05,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.35 | bwd_microstep: 5030.08 | bwd_inner_microstep: 5005.77 | bwd_allreduce_microstep: 24.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-31 13:07:14,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.31 | bwd_microstep: 5050.24 | bwd_inner_microstep: 5006.27 | bwd_allreduce_microstep: 43.90 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3706 [2024-07-31 13:07:22,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3093.15 | bwd_microstep: 4870.74 | bwd_inner_microstep: 4828.98 | bwd_allreduce_microstep: 41.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 13:07:31,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 13:07:31,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.14 | bwd_microstep: 4996.80 | bwd_inner_microstep: 4943.41 | bwd_allreduce_microstep: 53.33 | step_microstep: 181.31 [2024-07-31 13:07:31,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27924.61 | bwd: 40618.78 | bwd_inner: 39242.07 | bwd_allreduce: 1376.20 | step: 181.92 45%|████▌ | 557/1230 [10:55:37<13:02:00, 69.72s/it] {'loss': 1.1273, 'learning_rate': 1.200091850671972e-05, 'epoch': 0.45} 45%|████▌ | 557/1230 [10:55:37<13:02:00, 69.72s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 13:07:40,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3855.26 | bwd_microstep: 5391.76 | bwd_inner_microstep: 5372.69 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.09 dynamic ViT batch size: 13, images per sample: 6.5, dynamic token length: 2806 [2024-07-31 13:07:48,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3077.87 | bwd_microstep: 5009.38 | bwd_inner_microstep: 4622.94 | bwd_allreduce_microstep: 386.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2229 [2024-07-31 13:07:57,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.69 | bwd_microstep: 5265.92 | bwd_inner_microstep: 4857.44 | bwd_allreduce_microstep: 408.41 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3819 [2024-07-31 13:08:05,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3279.28 | bwd_microstep: 4984.13 | bwd_inner_microstep: 4947.44 | bwd_allreduce_microstep: 36.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-31 13:08:14,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.76 | bwd_microstep: 5176.57 | bwd_inner_microstep: 5120.86 | bwd_allreduce_microstep: 55.64 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 13:08:23,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.75 | bwd_microstep: 5228.04 | bwd_inner_microstep: 5145.69 | bwd_allreduce_microstep: 82.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-31 13:08:32,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.45 | bwd_microstep: 4994.79 | bwd_inner_microstep: 4975.37 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-31 13:08:41,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 13:08:41,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.28 | bwd_microstep: 5041.89 | bwd_inner_microstep: 4983.53 | bwd_allreduce_microstep: 58.29 | step_microstep: 181.40 [2024-07-31 13:08:41,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28359.23 | bwd: 41092.44 | bwd_inner: 40025.92 | bwd_allreduce: 1066.04 | step: 182.11 45%|████▌ | 558/1230 [10:56:46<13:01:04, 69.74s/it] {'loss': 1.1708, 'learning_rate': 1.1975110587259222e-05, 'epoch': 0.45} 45%|████▌ | 558/1230 [10:56:46<13:01:04, 69.74s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3987 [2024-07-31 13:08:50,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3866.14 | bwd_microstep: 5286.86 | bwd_inner_microstep: 5267.67 | bwd_allreduce_microstep: 19.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3604 [2024-07-31 13:08:59,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.00 | bwd_microstep: 5332.15 | bwd_inner_microstep: 5178.60 | bwd_allreduce_microstep: 153.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 13:09:07,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.65 | bwd_microstep: 5122.71 | bwd_inner_microstep: 5054.01 | bwd_allreduce_microstep: 68.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-31 13:09:16,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.39 | bwd_microstep: 5220.58 | bwd_inner_microstep: 4816.19 | bwd_allreduce_microstep: 404.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-31 13:09:25,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.58 | bwd_microstep: 5224.29 | bwd_inner_microstep: 5104.18 | bwd_allreduce_microstep: 120.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-31 13:09:34,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.44 | bwd_microstep: 4983.47 | bwd_inner_microstep: 4964.13 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-31 13:09:43,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.27 | bwd_microstep: 5158.74 | bwd_inner_microstep: 5103.61 | bwd_allreduce_microstep: 55.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3659 [2024-07-31 13:09:51,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 13:09:51,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.20 | bwd_microstep: 4883.50 | bwd_inner_microstep: 4858.33 | bwd_allreduce_microstep: 25.10 | step_microstep: 181.84 [2024-07-31 13:09:51,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29275.59 | bwd: 41212.27 | bwd_inner: 40346.66 | bwd_allreduce: 865.12 | step: 182.43 45%|████▌ | 559/1230 [10:57:57<13:03:32, 70.06s/it] {'loss': 1.1899, 'learning_rate': 1.1949288971285413e-05, 'epoch': 0.45} 45%|████▌ | 559/1230 [10:57:57<13:03:32, 70.06s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3960 [2024-07-31 13:10:01,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3816.31 | bwd_microstep: 5304.60 | bwd_inner_microstep: 5279.10 | bwd_allreduce_microstep: 25.42 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3577 [2024-07-31 13:10:09,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.48 | bwd_microstep: 5138.00 | bwd_inner_microstep: 5060.24 | bwd_allreduce_microstep: 77.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3562 [2024-07-31 13:10:18,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.75 | bwd_microstep: 5147.03 | bwd_inner_microstep: 5061.32 | bwd_allreduce_microstep: 85.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 13:10:26,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3460.46 | bwd_microstep: 4902.25 | bwd_inner_microstep: 4873.82 | bwd_allreduce_microstep: 28.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 13:10:35,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.07 | bwd_microstep: 5173.69 | bwd_inner_microstep: 5094.88 | bwd_allreduce_microstep: 78.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 13:10:44,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.01 | bwd_microstep: 5007.37 | bwd_inner_microstep: 4951.42 | bwd_allreduce_microstep: 55.88 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-31 13:10:52,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.40 | bwd_microstep: 4925.92 | bwd_inner_microstep: 4900.44 | bwd_allreduce_microstep: 25.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 13:11:01,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 13:11:01,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.49 | bwd_microstep: 4927.50 | bwd_inner_microstep: 4902.51 | bwd_allreduce_microstep: 24.91 | step_microstep: 182.73 [2024-07-31 13:11:01,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29035.86 | bwd: 40526.33 | bwd_inner: 40123.68 | bwd_allreduce: 402.16 | step: 183.31 46%|████▌ | 560/1230 [10:59:07<13:01:49, 70.01s/it] {'loss': 1.1923, 'learning_rate': 1.1923453837859706e-05, 'epoch': 0.46} 46%|████▌ | 560/1230 [10:59:07<13:01:49, 70.01s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3991 [2024-07-31 13:11:10,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.18 | bwd_microstep: 5471.92 | bwd_inner_microstep: 5400.53 | bwd_allreduce_microstep: 71.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3879 [2024-07-31 13:11:19,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.08 | bwd_microstep: 5177.32 | bwd_inner_microstep: 5131.21 | bwd_allreduce_microstep: 46.04 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-31 13:11:28,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.47 | bwd_microstep: 5163.33 | bwd_inner_microstep: 5083.17 | bwd_allreduce_microstep: 80.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2232 [2024-07-31 13:11:37,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.38 | bwd_microstep: 5269.68 | bwd_inner_microstep: 4861.83 | bwd_allreduce_microstep: 407.78 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3628 [2024-07-31 13:11:46,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.46 | bwd_microstep: 5241.06 | bwd_inner_microstep: 5131.26 | bwd_allreduce_microstep: 109.74 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-31 13:11:54,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.30 | bwd_microstep: 5090.28 | bwd_inner_microstep: 4695.86 | bwd_allreduce_microstep: 394.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3853 [2024-07-31 13:12:03,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3793.19 | bwd_microstep: 5110.53 | bwd_inner_microstep: 5091.14 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 13:12:11,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.78 [2024-07-31 13:12:11,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3188.19 | bwd_microstep: 4683.71 | bwd_inner_microstep: 4663.44 | bwd_allreduce_microstep: 20.20 | step_microstep: 182.02 [2024-07-31 13:12:11,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28582.16 | bwd: 41207.80 | bwd_inner: 40058.36 | bwd_allreduce: 1148.95 | step: 182.72 46%|████▌ | 561/1230 [11:00:17<13:01:00, 70.05s/it] {'loss': 1.1824, 'learning_rate': 1.1897605366137262e-05, 'epoch': 0.46} 46%|████▌ | 561/1230 [11:00:17<13:01:00, 70.05s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4084 [2024-07-31 13:12:20,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.74 | bwd_microstep: 5294.85 | bwd_inner_microstep: 5269.64 | bwd_allreduce_microstep: 25.13 | step_microstep: 0.10 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2808 [2024-07-31 13:12:29,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.46 | bwd_microstep: 5103.75 | bwd_inner_microstep: 4708.39 | bwd_allreduce_microstep: 395.29 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 13:12:38,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.74 | bwd_microstep: 4973.79 | bwd_inner_microstep: 4919.66 | bwd_allreduce_microstep: 54.06 | step_microstep: 0.08 dynamic ViT batch size: 13, images per sample: 6.5, dynamic token length: 2835 [2024-07-31 13:12:46,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.72 | bwd_microstep: 5228.52 | bwd_inner_microstep: 4820.96 | bwd_allreduce_microstep: 407.49 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3857 [2024-07-31 13:12:55,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.84 | bwd_microstep: 5268.12 | bwd_inner_microstep: 5193.90 | bwd_allreduce_microstep: 74.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 13:13:04,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.31 | bwd_microstep: 5045.36 | bwd_inner_microstep: 4982.23 | bwd_allreduce_microstep: 63.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 13:13:13,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.05 | bwd_microstep: 5053.57 | bwd_inner_microstep: 4992.04 | bwd_allreduce_microstep: 61.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 13:13:21,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-31 13:13:21,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.81 | bwd_microstep: 4902.60 | bwd_inner_microstep: 4883.29 | bwd_allreduce_microstep: 19.24 | step_microstep: 183.06 [2024-07-31 13:13:21,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28828.58 | bwd: 40870.53 | bwd_inner: 39770.06 | bwd_allreduce: 1099.98 | step: 183.67 46%|████▌ | 562/1230 [11:01:27<12:59:47, 70.04s/it] {'loss': 1.1466, 'learning_rate': 1.1871743735365735e-05, 'epoch': 0.46} 46%|████▌ | 562/1230 [11:01:27<12:59:47, 70.04s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3868 [2024-07-31 13:13:31,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.98 | bwd_microstep: 5386.21 | bwd_inner_microstep: 5315.84 | bwd_allreduce_microstep: 70.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2284 [2024-07-31 13:13:39,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.93 | bwd_microstep: 5309.41 | bwd_inner_microstep: 4899.61 | bwd_allreduce_microstep: 409.72 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-07-31 13:13:48,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.00 | bwd_microstep: 5156.09 | bwd_inner_microstep: 5111.31 | bwd_allreduce_microstep: 44.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2233 [2024-07-31 13:13:57,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.79 | bwd_microstep: 5214.23 | bwd_inner_microstep: 4811.18 | bwd_allreduce_microstep: 402.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 13:14:06,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.31 | bwd_microstep: 4972.01 | bwd_inner_microstep: 4936.95 | bwd_allreduce_microstep: 34.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-31 13:14:14,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.38 | bwd_microstep: 4959.02 | bwd_inner_microstep: 4924.77 | bwd_allreduce_microstep: 34.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-31 13:14:23,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.45 | bwd_microstep: 4945.89 | bwd_inner_microstep: 4915.29 | bwd_allreduce_microstep: 30.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-31 13:14:32,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 13:14:32,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.29 | bwd_microstep: 5358.48 | bwd_inner_microstep: 5234.19 | bwd_allreduce_microstep: 124.22 | step_microstep: 181.72 [2024-07-31 13:14:32,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29139.04 | bwd: 41301.32 | bwd_inner: 40149.09 | bwd_allreduce: 1151.75 | step: 182.31 46%|████▌ | 563/1230 [11:02:38<13:01:03, 70.26s/it] {'loss': 1.1514, 'learning_rate': 1.1845869124884027e-05, 'epoch': 0.46} 46%|████▌ | 563/1230 [11:02:38<13:01:03, 70.26s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3776 [2024-07-31 13:14:41,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.98 | bwd_microstep: 5519.54 | bwd_inner_microstep: 5426.52 | bwd_allreduce_microstep: 92.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3808 [2024-07-31 13:14:50,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.95 | bwd_microstep: 5285.45 | bwd_inner_microstep: 5221.91 | bwd_allreduce_microstep: 63.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3584 [2024-07-31 13:14:59,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.35 | bwd_microstep: 5307.46 | bwd_inner_microstep: 5203.09 | bwd_allreduce_microstep: 104.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-31 13:15:08,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.76 | bwd_microstep: 5008.46 | bwd_inner_microstep: 4989.13 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 13:15:17,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.99 | bwd_microstep: 5271.17 | bwd_inner_microstep: 4861.90 | bwd_allreduce_microstep: 409.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 13:15:25,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.24 | bwd_microstep: 4816.46 | bwd_inner_microstep: 4778.37 | bwd_allreduce_microstep: 38.03 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2115 [2024-07-31 13:15:34,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.39 | bwd_microstep: 5265.78 | bwd_inner_microstep: 4858.62 | bwd_allreduce_microstep: 407.09 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3691 [2024-07-31 13:15:43,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 13:15:43,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.08 | bwd_microstep: 5131.72 | bwd_inner_microstep: 5044.05 | bwd_allreduce_microstep: 87.60 | step_microstep: 181.93 [2024-07-31 13:15:43,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28666.64 | bwd: 41606.03 | bwd_inner: 40383.53 | bwd_allreduce: 1222.01 | step: 182.52 46%|████▌ | 564/1230 [11:03:49<13:01:01, 70.36s/it] {'loss': 1.182, 'learning_rate': 1.1819981714121054e-05, 'epoch': 0.46} 46%|████▌ | 564/1230 [11:03:49<13:01:01, 70.36s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 13:15:52,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.52 | bwd_microstep: 5488.94 | bwd_inner_microstep: 5399.97 | bwd_allreduce_microstep: 88.91 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3790 [2024-07-31 13:16:01,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.90 | bwd_microstep: 5180.39 | bwd_inner_microstep: 5123.58 | bwd_allreduce_microstep: 56.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3776 [2024-07-31 13:16:10,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.37 | bwd_microstep: 5389.34 | bwd_inner_microstep: 5307.11 | bwd_allreduce_microstep: 82.17 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3801 [2024-07-31 13:16:19,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3781.28 | bwd_microstep: 5023.23 | bwd_inner_microstep: 5003.97 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 13:16:27,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.17 | bwd_microstep: 5106.53 | bwd_inner_microstep: 5031.27 | bwd_allreduce_microstep: 75.19 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 13:16:35,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3189.80 | bwd_microstep: 4704.90 | bwd_inner_microstep: 4682.03 | bwd_allreduce_microstep: 22.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3669 [2024-07-31 13:16:44,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.02 | bwd_microstep: 4904.40 | bwd_inner_microstep: 4882.64 | bwd_allreduce_microstep: 21.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 13:16:53,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 13:16:53,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.76 | bwd_microstep: 4983.70 | bwd_inner_microstep: 4934.05 | bwd_allreduce_microstep: 49.58 | step_microstep: 181.98 [2024-07-31 13:16:53,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28795.73 | bwd: 40781.42 | bwd_inner: 40364.55 | bwd_allreduce: 416.39 | step: 182.68 46%|████▌ | 565/1230 [11:04:59<12:58:21, 70.23s/it] {'loss': 1.1404, 'learning_rate': 1.1794081682594491e-05, 'epoch': 0.46} 46%|████▌ | 565/1230 [11:04:59<12:58:21, 70.23s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4003 [2024-07-31 13:17:02,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3848.62 | bwd_microstep: 5255.32 | bwd_inner_microstep: 5236.30 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3812 [2024-07-31 13:17:11,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.67 | bwd_microstep: 5037.56 | bwd_inner_microstep: 5018.27 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-31 13:17:20,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.90 | bwd_microstep: 5125.90 | bwd_inner_microstep: 5093.00 | bwd_allreduce_microstep: 32.83 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 13:17:28,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.49 | bwd_microstep: 5114.32 | bwd_inner_microstep: 5067.97 | bwd_allreduce_microstep: 46.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 13:17:37,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.51 | bwd_microstep: 5193.43 | bwd_inner_microstep: 5132.18 | bwd_allreduce_microstep: 61.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 13:17:46,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.15 | bwd_microstep: 5173.35 | bwd_inner_microstep: 5090.04 | bwd_allreduce_microstep: 83.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3697 [2024-07-31 13:17:55,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.67 | bwd_microstep: 5069.65 | bwd_inner_microstep: 4997.97 | bwd_allreduce_microstep: 71.62 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-31 13:18:03,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 13:18:03,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.50 | bwd_microstep: 5115.14 | bwd_inner_microstep: 4717.93 | bwd_allreduce_microstep: 397.15 | step_microstep: 183.17 [2024-07-31 13:18:03,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29231.41 | bwd: 41084.67 | bwd_inner: 40353.60 | bwd_allreduce: 730.58 | step: 183.77 46%|████▌ | 566/1230 [11:06:09<12:58:35, 70.35s/it] {'loss': 1.1638, 'learning_rate': 1.1768169209909544e-05, 'epoch': 0.46} 46%|████▌ | 566/1230 [11:06:09<12:58:35, 70.35s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3844 [2024-07-31 13:18:12,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.27 | bwd_microstep: 5226.80 | bwd_inner_microstep: 5172.71 | bwd_allreduce_microstep: 54.02 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3976 [2024-07-31 13:18:21,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3847.78 | bwd_microstep: 5243.78 | bwd_inner_microstep: 5224.49 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3784 [2024-07-31 13:18:30,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.80 | bwd_microstep: 5331.31 | bwd_inner_microstep: 5255.75 | bwd_allreduce_microstep: 75.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 13:18:39,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.30 | bwd_microstep: 5173.12 | bwd_inner_microstep: 5093.73 | bwd_allreduce_microstep: 79.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-31 13:18:48,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.60 | bwd_microstep: 4892.39 | bwd_inner_microstep: 4873.03 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2175 [2024-07-31 13:18:57,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.60 | bwd_microstep: 5195.60 | bwd_inner_microstep: 4791.82 | bwd_allreduce_microstep: 403.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 13:19:05,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.39 | bwd_microstep: 4886.41 | bwd_inner_microstep: 4867.05 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 13:19:14,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 13:19:14,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.48 | bwd_microstep: 5005.55 | bwd_inner_microstep: 4948.61 | bwd_allreduce_microstep: 56.87 | step_microstep: 181.98 [2024-07-31 13:19:14,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29241.13 | bwd: 40954.94 | bwd_inner: 40227.13 | bwd_allreduce: 727.33 | step: 182.56 46%|████▌ | 567/1230 [11:07:20<12:58:01, 70.41s/it] {'loss': 1.1555, 'learning_rate': 1.174224447575767e-05, 'epoch': 0.46} 46%|████▌ | 567/1230 [11:07:20<12:58:01, 70.41s/it]dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1339 [2024-07-31 13:19:23,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.18 | bwd_microstep: 5389.43 | bwd_inner_microstep: 4974.12 | bwd_allreduce_microstep: 415.24 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3867 [2024-07-31 13:19:32,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.68 | bwd_microstep: 5343.50 | bwd_inner_microstep: 5274.69 | bwd_allreduce_microstep: 68.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3593 [2024-07-31 13:19:41,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.64 | bwd_microstep: 5142.34 | bwd_inner_microstep: 5045.33 | bwd_allreduce_microstep: 96.93 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3616 [2024-07-31 13:19:49,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.61 | bwd_microstep: 5184.47 | bwd_inner_microstep: 5091.36 | bwd_allreduce_microstep: 93.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 13:19:58,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.43 | bwd_microstep: 5014.11 | bwd_inner_microstep: 4989.62 | bwd_allreduce_microstep: 24.42 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-31 13:20:07,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.30 | bwd_microstep: 5073.37 | bwd_inner_microstep: 5007.38 | bwd_allreduce_microstep: 65.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 13:20:16,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.35 | bwd_microstep: 5163.80 | bwd_inner_microstep: 5087.10 | bwd_allreduce_microstep: 76.64 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 13:20:25,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 13:20:25,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.61 | bwd_microstep: 5052.86 | bwd_inner_microstep: 4993.65 | bwd_allreduce_microstep: 59.14 | step_microstep: 182.19 [2024-07-31 13:20:25,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28951.69 | bwd: 41363.86 | bwd_inner: 40463.20 | bwd_allreduce: 900.17 | step: 182.80 46%|████▌ | 568/1230 [11:08:30<12:57:38, 70.48s/it] {'loss': 1.1643, 'learning_rate': 1.1716307659915382e-05, 'epoch': 0.46} 46%|████▌ | 568/1230 [11:08:30<12:57:38, 70.48s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3902 [2024-07-31 13:20:34,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3863.93 | bwd_microstep: 5433.44 | bwd_inner_microstep: 5370.32 | bwd_allreduce_microstep: 63.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3945 [2024-07-31 13:20:43,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.58 | bwd_microstep: 5169.58 | bwd_inner_microstep: 5150.24 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3580 [2024-07-31 13:20:52,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.86 | bwd_microstep: 5267.55 | bwd_inner_microstep: 5167.80 | bwd_allreduce_microstep: 99.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-07-31 13:21:00,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.86 | bwd_microstep: 5083.64 | bwd_inner_microstep: 5042.87 | bwd_allreduce_microstep: 40.70 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 13:21:09,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.32 | bwd_microstep: 5124.67 | bwd_inner_microstep: 5046.87 | bwd_allreduce_microstep: 77.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-31 13:21:17,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3048.22 | bwd_microstep: 5013.94 | bwd_inner_microstep: 4629.41 | bwd_allreduce_microstep: 384.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 13:21:26,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.85 | bwd_microstep: 4987.49 | bwd_inner_microstep: 4932.60 | bwd_allreduce_microstep: 54.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 13:21:35,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 13:21:35,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.49 | bwd_microstep: 4886.21 | bwd_inner_microstep: 4866.86 | bwd_allreduce_microstep: 19.28 | step_microstep: 181.34 [2024-07-31 13:21:35,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28689.04 | bwd: 40966.51 | bwd_inner: 40206.91 | bwd_allreduce: 759.10 | step: 181.94 46%|████▋ | 569/1230 [11:09:40<12:54:49, 70.33s/it] {'loss': 1.1342, 'learning_rate': 1.169035894224295e-05, 'epoch': 0.46} 46%|████▋ | 569/1230 [11:09:40<12:54:49, 70.33s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 13:21:44,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3885.61 | bwd_microstep: 5786.04 | bwd_inner_microstep: 5710.62 | bwd_allreduce_microstep: 75.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2198 [2024-07-31 13:21:53,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.73 | bwd_microstep: 5402.83 | bwd_inner_microstep: 4986.74 | bwd_allreduce_microstep: 416.03 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2060 [2024-07-31 13:22:01,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2979.94 | bwd_microstep: 4964.27 | bwd_inner_microstep: 4584.89 | bwd_allreduce_microstep: 379.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-31 13:22:10,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.14 | bwd_microstep: 5249.22 | bwd_inner_microstep: 5229.87 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2233 [2024-07-31 13:22:19,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.26 | bwd_microstep: 5049.75 | bwd_inner_microstep: 4658.21 | bwd_allreduce_microstep: 391.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 13:22:28,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.40 | bwd_microstep: 5212.51 | bwd_inner_microstep: 4806.03 | bwd_allreduce_microstep: 406.41 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3697 [2024-07-31 13:22:36,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.31 | bwd_microstep: 4987.01 | bwd_inner_microstep: 4924.42 | bwd_allreduce_microstep: 62.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 13:22:45,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.85 [2024-07-31 13:22:45,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.36 | bwd_microstep: 4986.76 | bwd_inner_microstep: 4967.30 | bwd_allreduce_microstep: 19.39 | step_microstep: 479.40 [2024-07-31 13:22:45,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28470.66 | bwd: 41638.37 | bwd_inner: 39868.01 | bwd_allreduce: 1769.88 | step: 479.97 46%|████▋ | 570/1230 [11:10:51<12:54:58, 70.45s/it] {'loss': 1.1887, 'learning_rate': 1.1664398502683194e-05, 'epoch': 0.46} 46%|████▋ | 570/1230 [11:10:51<12:54:58, 70.45s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3531 [2024-07-31 13:22:54,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.23 | bwd_microstep: 5473.99 | bwd_inner_microstep: 5290.74 | bwd_allreduce_microstep: 183.18 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3864 [2024-07-31 13:23:03,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.70 | bwd_microstep: 5183.83 | bwd_inner_microstep: 5147.68 | bwd_allreduce_microstep: 36.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3933 [2024-07-31 13:23:12,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.55 | bwd_microstep: 5178.91 | bwd_inner_microstep: 5139.80 | bwd_allreduce_microstep: 39.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3790 [2024-07-31 13:23:21,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.56 | bwd_microstep: 5093.97 | bwd_inner_microstep: 5052.86 | bwd_allreduce_microstep: 41.05 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-07-31 13:23:30,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.70 | bwd_microstep: 5025.69 | bwd_inner_microstep: 5006.31 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-31 13:23:38,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.45 | bwd_microstep: 5015.41 | bwd_inner_microstep: 4982.46 | bwd_allreduce_microstep: 32.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 13:23:47,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.75 | bwd_microstep: 5083.04 | bwd_inner_microstep: 5023.99 | bwd_allreduce_microstep: 58.98 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3651 [2024-07-31 13:23:56,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 13:23:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.27 | bwd_microstep: 4958.47 | bwd_inner_microstep: 4900.32 | bwd_allreduce_microstep: 58.08 | step_microstep: 182.42 [2024-07-31 13:23:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29086.11 | bwd: 41013.29 | bwd_inner: 40544.10 | bwd_allreduce: 468.70 | step: 183.02 46%|████▋ | 571/1230 [11:12:02<12:53:44, 70.45s/it] {'loss': 1.1293, 'learning_rate': 1.1638426521260211e-05, 'epoch': 0.46} 46%|████▋ | 571/1230 [11:12:02<12:53:44, 70.45s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2293 [2024-07-31 13:24:05,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.00 | bwd_microstep: 5288.37 | bwd_inner_microstep: 4881.42 | bwd_allreduce_microstep: 406.88 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2826 [2024-07-31 13:24:14,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.32 | bwd_microstep: 5595.76 | bwd_inner_microstep: 5188.54 | bwd_allreduce_microstep: 407.14 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2264 [2024-07-31 13:24:23,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.76 | bwd_microstep: 5337.17 | bwd_inner_microstep: 4921.46 | bwd_allreduce_microstep: 415.63 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2046 [2024-07-31 13:24:32,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.19 | bwd_microstep: 5242.14 | bwd_inner_microstep: 4836.85 | bwd_allreduce_microstep: 405.22 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-31 13:24:40,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.06 | bwd_microstep: 4985.53 | bwd_inner_microstep: 4966.17 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 13:24:49,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.63 | bwd_microstep: 5096.22 | bwd_inner_microstep: 5028.17 | bwd_allreduce_microstep: 67.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 13:24:58,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.91 | bwd_microstep: 5043.08 | bwd_inner_microstep: 4986.38 | bwd_allreduce_microstep: 56.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 13:25:07,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 13:25:07,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.91 | bwd_microstep: 5306.91 | bwd_inner_microstep: 5144.73 | bwd_allreduce_microstep: 162.12 | step_microstep: 181.55 [2024-07-31 13:25:07,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28754.67 | bwd: 41895.16 | bwd_inner: 39953.66 | bwd_allreduce: 1941.01 | step: 182.24 47%|████▋ | 572/1230 [11:13:13<12:54:19, 70.61s/it] {'loss': 1.1726, 'learning_rate': 1.1612443178078138e-05, 'epoch': 0.46} 47%|████▋ | 572/1230 [11:13:13<12:54:19, 70.61s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3909 [2024-07-31 13:25:16,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3866.16 | bwd_microstep: 5472.16 | bwd_inner_microstep: 5406.95 | bwd_allreduce_microstep: 65.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 13:25:25,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.45 | bwd_microstep: 5208.96 | bwd_inner_microstep: 5132.35 | bwd_allreduce_microstep: 76.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 13:25:34,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.64 | bwd_microstep: 5158.41 | bwd_inner_microstep: 5120.94 | bwd_allreduce_microstep: 37.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 13:25:42,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.94 | bwd_microstep: 5062.55 | bwd_inner_microstep: 4667.92 | bwd_allreduce_microstep: 394.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 13:25:51,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.68 | bwd_microstep: 5070.17 | bwd_inner_microstep: 5012.86 | bwd_allreduce_microstep: 57.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 13:26:00,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.42 | bwd_microstep: 5271.98 | bwd_inner_microstep: 4864.10 | bwd_allreduce_microstep: 407.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 13:26:08,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3200.23 | bwd_microstep: 4717.49 | bwd_inner_microstep: 4693.14 | bwd_allreduce_microstep: 24.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2123 [2024-07-31 13:26:17,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 13:26:17,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.70 | bwd_microstep: 5118.22 | bwd_inner_microstep: 4719.36 | bwd_allreduce_microstep: 398.79 | step_microstep: 181.28 [2024-07-31 13:26:17,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28630.13 | bwd: 41079.92 | bwd_inner: 39617.56 | bwd_allreduce: 1461.88 | step: 181.86 47%|████▋ | 573/1230 [11:14:23<12:51:17, 70.44s/it] {'loss': 1.1474, 'learning_rate': 1.1586448653319908e-05, 'epoch': 0.47} 47%|████▋ | 573/1230 [11:14:23<12:51:17, 70.44s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3964 [2024-07-31 13:26:26,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.59 | bwd_microstep: 5335.90 | bwd_inner_microstep: 5278.80 | bwd_allreduce_microstep: 57.03 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3773 [2024-07-31 13:26:35,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.62 | bwd_microstep: 5299.20 | bwd_inner_microstep: 5208.86 | bwd_allreduce_microstep: 90.28 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-31 13:26:43,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.96 | bwd_microstep: 5010.46 | bwd_inner_microstep: 4986.45 | bwd_allreduce_microstep: 23.94 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2084 [2024-07-31 13:26:52,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.52 | bwd_microstep: 5149.50 | bwd_inner_microstep: 4750.14 | bwd_allreduce_microstep: 399.29 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2161 [2024-07-31 13:27:01,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.14 | bwd_microstep: 5075.92 | bwd_inner_microstep: 4682.59 | bwd_allreduce_microstep: 393.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 13:27:09,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.94 | bwd_microstep: 5130.16 | bwd_inner_microstep: 5058.28 | bwd_allreduce_microstep: 71.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 13:27:18,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.19 | bwd_microstep: 5033.38 | bwd_inner_microstep: 4973.91 | bwd_allreduce_microstep: 59.41 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3720 [2024-07-31 13:27:27,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 13:27:27,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.44 | bwd_microstep: 5002.68 | bwd_inner_microstep: 4949.13 | bwd_allreduce_microstep: 53.48 | step_microstep: 182.84 [2024-07-31 13:27:27,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28789.31 | bwd: 41037.19 | bwd_inner: 39888.09 | bwd_allreduce: 1148.62 | step: 183.42 47%|████▋ | 574/1230 [11:15:33<12:49:11, 70.35s/it] {'loss': 1.1406, 'learning_rate': 1.156044312724598e-05, 'epoch': 0.47} 47%|████▋ | 574/1230 [11:15:33<12:49:11, 70.35s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2760 [2024-07-31 13:27:36,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.41 | bwd_microstep: 5419.87 | bwd_inner_microstep: 5003.64 | bwd_allreduce_microstep: 416.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3853 [2024-07-31 13:27:45,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.93 | bwd_microstep: 5063.72 | bwd_inner_microstep: 5028.23 | bwd_allreduce_microstep: 35.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 13:27:53,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.32 | bwd_microstep: 5244.35 | bwd_inner_microstep: 4836.54 | bwd_allreduce_microstep: 407.75 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3757 [2024-07-31 13:28:02,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.36 | bwd_microstep: 5061.52 | bwd_inner_microstep: 5029.64 | bwd_allreduce_microstep: 31.82 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-31 13:28:11,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.54 | bwd_microstep: 5012.04 | bwd_inner_microstep: 4991.76 | bwd_allreduce_microstep: 20.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 13:28:19,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3241.51 | bwd_microstep: 4831.34 | bwd_inner_microstep: 4806.52 | bwd_allreduce_microstep: 24.76 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 13:28:28,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.32 | bwd_microstep: 5188.03 | bwd_inner_microstep: 4785.72 | bwd_allreduce_microstep: 402.24 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 13:28:37,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 13:28:37,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.00 | bwd_microstep: 5111.67 | bwd_inner_microstep: 5042.14 | bwd_allreduce_microstep: 69.45 | step_microstep: 181.44 [2024-07-31 13:28:37,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28604.30 | bwd: 40932.52 | bwd_inner: 39524.12 | bwd_allreduce: 1407.92 | step: 182.04 47%|████▋ | 575/1230 [11:16:43<12:46:25, 70.21s/it] {'loss': 1.1759, 'learning_rate': 1.1534426780193114e-05, 'epoch': 0.47} 47%|████▋ | 575/1230 [11:16:43<12:46:25, 70.21s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3916 [2024-07-31 13:28:46,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3692.90 | bwd_microstep: 5374.05 | bwd_inner_microstep: 5312.19 | bwd_allreduce_microstep: 61.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2270 [2024-07-31 13:28:55,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.55 | bwd_microstep: 5201.52 | bwd_inner_microstep: 4798.43 | bwd_allreduce_microstep: 403.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 13:29:03,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.03 | bwd_microstep: 5140.09 | bwd_inner_microstep: 5066.66 | bwd_allreduce_microstep: 73.36 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2255 [2024-07-31 13:29:12,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.95 | bwd_microstep: 5120.00 | bwd_inner_microstep: 4722.06 | bwd_allreduce_microstep: 397.88 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2085 [2024-07-31 13:29:20,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2989.47 | bwd_microstep: 4846.17 | bwd_inner_microstep: 4471.31 | bwd_allreduce_microstep: 374.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 13:29:29,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.73 | bwd_microstep: 5132.29 | bwd_inner_microstep: 5054.30 | bwd_allreduce_microstep: 77.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 13:29:37,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.52 | bwd_microstep: 4929.21 | bwd_inner_microstep: 4903.19 | bwd_allreduce_microstep: 25.96 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3707 [2024-07-31 13:29:45,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 13:29:45,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3094.43 | bwd_microstep: 4778.95 | bwd_inner_microstep: 4745.07 | bwd_allreduce_microstep: 33.80 | step_microstep: 182.52 [2024-07-31 13:29:45,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27716.52 | bwd: 40522.25 | bwd_inner: 39073.14 | bwd_allreduce: 1448.62 | step: 183.10 47%|████▋ | 576/1230 [11:17:51<12:39:53, 69.72s/it] {'loss': 1.1365, 'learning_rate': 1.1508399792573095e-05, 'epoch': 0.47} 47%|████▋ | 576/1230 [11:17:51<12:39:53, 69.72s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3900 [2024-07-31 13:29:54,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.80 | bwd_microstep: 5181.30 | bwd_inner_microstep: 5146.54 | bwd_allreduce_microstep: 34.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3575 [2024-07-31 13:30:03,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.00 | bwd_microstep: 5178.73 | bwd_inner_microstep: 5088.65 | bwd_allreduce_microstep: 90.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 13:30:12,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.35 | bwd_microstep: 5318.77 | bwd_inner_microstep: 5220.18 | bwd_allreduce_microstep: 98.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 13:30:21,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.20 | bwd_microstep: 5316.95 | bwd_inner_microstep: 5219.56 | bwd_allreduce_microstep: 97.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-31 13:30:30,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.10 | bwd_microstep: 5116.16 | bwd_inner_microstep: 5065.85 | bwd_allreduce_microstep: 50.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-31 13:30:38,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.98 | bwd_microstep: 4979.22 | bwd_inner_microstep: 4959.74 | bwd_allreduce_microstep: 19.41 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2138 [2024-07-31 13:30:46,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3027.35 | bwd_microstep: 4917.72 | bwd_inner_microstep: 4539.73 | bwd_allreduce_microstep: 377.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 13:30:55,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 13:30:55,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.26 | bwd_microstep: 5053.04 | bwd_inner_microstep: 4993.38 | bwd_allreduce_microstep: 59.59 | step_microstep: 181.36 [2024-07-31 13:30:55,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28460.94 | bwd: 41061.87 | bwd_inner: 40233.57 | bwd_allreduce: 827.82 | step: 181.94 47%|████▋ | 577/1230 [11:19:01<12:39:11, 69.76s/it] {'loss': 1.1808, 'learning_rate': 1.1482362344871516e-05, 'epoch': 0.47} 47%|████▋ | 577/1230 [11:19:01<12:39:11, 69.76s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2031 [2024-07-31 13:31:04,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.72 | bwd_microstep: 5585.80 | bwd_inner_microstep: 5173.07 | bwd_allreduce_microstep: 412.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-31 13:31:13,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.81 | bwd_microstep: 5162.50 | bwd_inner_microstep: 5080.80 | bwd_allreduce_microstep: 81.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2230 [2024-07-31 13:31:22,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.12 | bwd_microstep: 5156.38 | bwd_inner_microstep: 4755.65 | bwd_allreduce_microstep: 400.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-31 13:31:31,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.36 | bwd_microstep: 5051.08 | bwd_inner_microstep: 5022.67 | bwd_allreduce_microstep: 28.34 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3732 [2024-07-31 13:31:39,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.00 | bwd_microstep: 5087.60 | bwd_inner_microstep: 5031.06 | bwd_allreduce_microstep: 56.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 13:31:48,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.32 | bwd_microstep: 4974.38 | bwd_inner_microstep: 4939.15 | bwd_allreduce_microstep: 35.16 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3681 [2024-07-31 13:31:56,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.37 | bwd_microstep: 4946.60 | bwd_inner_microstep: 4896.44 | bwd_allreduce_microstep: 50.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 13:32:05,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 13:32:05,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.02 | bwd_microstep: 5005.21 | bwd_inner_microstep: 4954.35 | bwd_allreduce_microstep: 50.80 | step_microstep: 182.84 [2024-07-31 13:32:05,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28699.65 | bwd: 40969.54 | bwd_inner: 39853.11 | bwd_allreduce: 1115.94 | step: 183.44 47%|████▋ | 578/1230 [11:20:11<12:38:49, 69.83s/it] {'loss': 1.206, 'learning_rate': 1.1456314617646485e-05, 'epoch': 0.47} 47%|████▋ | 578/1230 [11:20:11<12:38:49, 69.83s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 13:32:15,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3927.90 | bwd_microstep: 5438.23 | bwd_inner_microstep: 5409.90 | bwd_allreduce_microstep: 28.26 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2043 [2024-07-31 13:32:24,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.18 | bwd_microstep: 5388.65 | bwd_inner_microstep: 4972.45 | bwd_allreduce_microstep: 416.13 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3779 [2024-07-31 13:32:32,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.23 | bwd_microstep: 5018.90 | bwd_inner_microstep: 4999.50 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3905 [2024-07-31 13:32:41,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3793.59 | bwd_microstep: 5158.47 | bwd_inner_microstep: 5139.19 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 13:32:50,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.00 | bwd_microstep: 5149.65 | bwd_inner_microstep: 5099.11 | bwd_allreduce_microstep: 50.48 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 13:32:59,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.18 | bwd_microstep: 5008.30 | bwd_inner_microstep: 4972.46 | bwd_allreduce_microstep: 35.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-31 13:33:07,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.42 | bwd_microstep: 5014.90 | bwd_inner_microstep: 4995.47 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2137 [2024-07-31 13:33:16,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 13:33:16,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.86 | bwd_microstep: 5050.32 | bwd_inner_microstep: 4658.18 | bwd_allreduce_microstep: 392.06 | step_microstep: 181.88 [2024-07-31 13:33:16,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29461.25 | bwd: 41227.40 | bwd_inner: 40246.20 | bwd_allreduce: 980.71 | step: 182.59 47%|████▋ | 579/1230 [11:21:22<12:41:32, 70.19s/it] {'loss': 1.1531, 'learning_rate': 1.143025679152741e-05, 'epoch': 0.47} 47%|████▋ | 579/1230 [11:21:22<12:41:32, 70.19s/it]dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3840 [2024-07-31 13:33:25,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.29 | bwd_microstep: 5171.44 | bwd_inner_microstep: 5141.93 | bwd_allreduce_microstep: 29.45 | step_microstep: 0.09 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1034 [2024-07-31 13:33:34,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.69 | bwd_microstep: 5269.66 | bwd_inner_microstep: 4863.44 | bwd_allreduce_microstep: 406.15 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2840 [2024-07-31 13:33:43,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.18 | bwd_microstep: 5249.78 | bwd_inner_microstep: 4840.45 | bwd_allreduce_microstep: 409.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3655 [2024-07-31 13:33:51,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.20 | bwd_microstep: 4963.02 | bwd_inner_microstep: 4931.13 | bwd_allreduce_microstep: 31.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-31 13:34:00,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3448.38 | bwd_microstep: 4861.19 | bwd_inner_microstep: 4841.91 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 13:34:09,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.56 | bwd_microstep: 5190.22 | bwd_inner_microstep: 5112.55 | bwd_allreduce_microstep: 77.59 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3689 [2024-07-31 13:34:17,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.46 | bwd_microstep: 5032.18 | bwd_inner_microstep: 4963.49 | bwd_allreduce_microstep: 68.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 13:34:26,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 13:34:26,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.66 | bwd_microstep: 5064.21 | bwd_inner_microstep: 5003.45 | bwd_allreduce_microstep: 60.68 | step_microstep: 181.78 [2024-07-31 13:34:26,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28689.33 | bwd: 40801.68 | bwd_inner: 39698.30 | bwd_allreduce: 1102.89 | step: 182.39 47%|████▋ | 580/1230 [11:22:32<12:39:11, 70.08s/it] {'loss': 1.1951, 'learning_rate': 1.1404189047213716e-05, 'epoch': 0.47} 47%|████▋ | 580/1230 [11:22:32<12:39:11, 70.08s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 13:34:35,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.73 | bwd_microstep: 5322.45 | bwd_inner_microstep: 5243.85 | bwd_allreduce_microstep: 78.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3973 [2024-07-31 13:34:44,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.49 | bwd_microstep: 5193.05 | bwd_inner_microstep: 5146.46 | bwd_allreduce_microstep: 46.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3818 [2024-07-31 13:34:53,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.89 | bwd_microstep: 5079.05 | bwd_inner_microstep: 5058.32 | bwd_allreduce_microstep: 20.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3831 [2024-07-31 13:35:01,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.91 | bwd_microstep: 5069.74 | bwd_inner_microstep: 5030.14 | bwd_allreduce_microstep: 39.54 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3716 [2024-07-31 13:35:10,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.78 | bwd_microstep: 5091.01 | bwd_inner_microstep: 5051.51 | bwd_allreduce_microstep: 39.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3662 [2024-07-31 13:35:19,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.75 | bwd_microstep: 5087.65 | bwd_inner_microstep: 5005.44 | bwd_allreduce_microstep: 82.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 13:35:27,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.11 | bwd_microstep: 4899.79 | bwd_inner_microstep: 4880.46 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 13:35:36,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 13:35:36,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.48 | bwd_microstep: 5081.43 | bwd_inner_microstep: 4687.20 | bwd_allreduce_microstep: 394.16 | step_microstep: 182.12 [2024-07-31 13:35:36,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29135.03 | bwd: 40824.14 | bwd_inner: 40103.32 | bwd_allreduce: 720.34 | step: 182.70 47%|████▋ | 581/1230 [11:23:42<12:38:42, 70.14s/it] {'loss': 1.1524, 'learning_rate': 1.137811156547362e-05, 'epoch': 0.47} 47%|████▋ | 581/1230 [11:23:42<12:38:42, 70.14s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 13:35:46,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3865.70 | bwd_microstep: 5593.31 | bwd_inner_microstep: 5574.17 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2052 [2024-07-31 13:35:55,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.72 | bwd_microstep: 5249.07 | bwd_inner_microstep: 4843.98 | bwd_allreduce_microstep: 405.01 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2061 [2024-07-31 13:36:03,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.48 | bwd_microstep: 5281.60 | bwd_inner_microstep: 4872.43 | bwd_allreduce_microstep: 409.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 13:36:12,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.43 | bwd_microstep: 5076.39 | bwd_inner_microstep: 5007.96 | bwd_allreduce_microstep: 68.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 13:36:20,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3019.33 | bwd_microstep: 4919.29 | bwd_inner_microstep: 4539.56 | bwd_allreduce_microstep: 379.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-31 13:36:29,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.95 | bwd_microstep: 5065.87 | bwd_inner_microstep: 5002.95 | bwd_allreduce_microstep: 62.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3626 [2024-07-31 13:36:37,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.37 | bwd_microstep: 4996.71 | bwd_inner_microstep: 4922.51 | bwd_allreduce_microstep: 74.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 13:36:46,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 13:36:46,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3309.15 | bwd_microstep: 4762.20 | bwd_inner_microstep: 4737.65 | bwd_allreduce_microstep: 24.48 | step_microstep: 183.32 [2024-07-31 13:36:46,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27947.05 | bwd: 40944.42 | bwd_inner: 39501.15 | bwd_allreduce: 1442.78 | step: 183.90 47%|████▋ | 582/1230 [11:24:51<12:34:32, 69.87s/it] {'loss': 1.1732, 'learning_rate': 1.1352024527142857e-05, 'epoch': 0.47} 47%|████▋ | 582/1230 [11:24:51<12:34:32, 69.87s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3942 [2024-07-31 13:36:55,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3883.06 | bwd_microstep: 5332.34 | bwd_inner_microstep: 5292.74 | bwd_allreduce_microstep: 39.53 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3840 [2024-07-31 13:37:04,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.99 | bwd_microstep: 5246.35 | bwd_inner_microstep: 5170.10 | bwd_allreduce_microstep: 76.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-31 13:37:12,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.58 | bwd_microstep: 5122.27 | bwd_inner_microstep: 5077.75 | bwd_allreduce_microstep: 44.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 13:37:21,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.81 | bwd_microstep: 5213.94 | bwd_inner_microstep: 5129.76 | bwd_allreduce_microstep: 84.11 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-31 13:37:30,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.99 | bwd_microstep: 5300.78 | bwd_inner_microstep: 5232.77 | bwd_allreduce_microstep: 67.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 13:37:39,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.11 | bwd_microstep: 5007.90 | bwd_inner_microstep: 4983.60 | bwd_allreduce_microstep: 24.23 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-31 13:37:47,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.64 | bwd_microstep: 4952.72 | bwd_inner_microstep: 4569.09 | bwd_allreduce_microstep: 383.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 13:37:55,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 13:37:55,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.41 | bwd_microstep: 4763.73 | bwd_inner_microstep: 4734.19 | bwd_allreduce_microstep: 29.47 | step_microstep: 211.19 [2024-07-31 13:37:55,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28405.51 | bwd: 40940.02 | bwd_inner: 40189.93 | bwd_allreduce: 749.58 | step: 211.89 47%|████▋ | 583/1230 [11:26:01<12:32:52, 69.82s/it] {'loss': 1.1932, 'learning_rate': 1.1325928113123431e-05, 'epoch': 0.47} 47%|████▋ | 583/1230 [11:26:01<12:32:52, 69.82s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 13:38:04,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.92 | bwd_microstep: 5203.45 | bwd_inner_microstep: 5184.39 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2290 [2024-07-31 13:38:13,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.11 | bwd_microstep: 5278.49 | bwd_inner_microstep: 4868.95 | bwd_allreduce_microstep: 409.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-31 13:38:22,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.54 | bwd_microstep: 5042.58 | bwd_inner_microstep: 5015.90 | bwd_allreduce_microstep: 26.61 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2207 [2024-07-31 13:38:31,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.69 | bwd_microstep: 5235.84 | bwd_inner_microstep: 4830.08 | bwd_allreduce_microstep: 405.69 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 13:38:40,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.13 | bwd_microstep: 5211.80 | bwd_inner_microstep: 5129.43 | bwd_allreduce_microstep: 82.30 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2162 [2024-07-31 13:38:48,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.74 | bwd_microstep: 5176.71 | bwd_inner_microstep: 4773.33 | bwd_allreduce_microstep: 403.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 13:38:57,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3407.33 | bwd_microstep: 4832.00 | bwd_inner_microstep: 4800.15 | bwd_allreduce_microstep: 31.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 13:39:05,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 13:39:05,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.02 | bwd_microstep: 5030.57 | bwd_inner_microstep: 4975.50 | bwd_allreduce_microstep: 55.00 | step_microstep: 181.29 [2024-07-31 13:39:05,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28773.37 | bwd: 41011.41 | bwd_inner: 39577.66 | bwd_allreduce: 1433.27 | step: 181.89 47%|████▋ | 584/1230 [11:27:11<12:32:39, 69.91s/it] {'loss': 1.1737, 'learning_rate': 1.1299822504382365e-05, 'epoch': 0.47} 47%|████▋ | 584/1230 [11:27:11<12:32:39, 69.91s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3806 [2024-07-31 13:39:14,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.84 | bwd_microstep: 5284.51 | bwd_inner_microstep: 5218.62 | bwd_allreduce_microstep: 65.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3834 [2024-07-31 13:39:23,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.43 | bwd_microstep: 5190.96 | bwd_inner_microstep: 5153.94 | bwd_allreduce_microstep: 36.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3770 [2024-07-31 13:39:32,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.15 | bwd_microstep: 5347.74 | bwd_inner_microstep: 5270.14 | bwd_allreduce_microstep: 77.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3864 [2024-07-31 13:39:41,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.38 | bwd_microstep: 5172.17 | bwd_inner_microstep: 5128.44 | bwd_allreduce_microstep: 43.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2248 [2024-07-31 13:39:49,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3047.79 | bwd_microstep: 5002.24 | bwd_inner_microstep: 4618.28 | bwd_allreduce_microstep: 383.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-31 13:39:58,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.87 | bwd_microstep: 5170.16 | bwd_inner_microstep: 5115.86 | bwd_allreduce_microstep: 54.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 13:40:07,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.16 | bwd_microstep: 5132.13 | bwd_inner_microstep: 5065.04 | bwd_allreduce_microstep: 67.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 13:40:16,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 13:40:16,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.76 | bwd_microstep: 5121.48 | bwd_inner_microstep: 5052.59 | bwd_allreduce_microstep: 68.82 | step_microstep: 181.95 [2024-07-31 13:40:16,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28599.29 | bwd: 41421.36 | bwd_inner: 40622.85 | bwd_allreduce: 798.03 | step: 182.53 48%|████▊ | 585/1230 [11:28:22<12:32:56, 70.04s/it] {'loss': 1.1712, 'learning_rate': 1.1273707881950447e-05, 'epoch': 0.48} 48%|████▊ | 585/1230 [11:28:22<12:32:56, 70.04s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4012 [2024-07-31 13:40:25,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3828.68 | bwd_microstep: 5258.42 | bwd_inner_microstep: 5239.38 | bwd_allreduce_microstep: 18.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2320 [2024-07-31 13:40:34,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.54 | bwd_microstep: 5311.73 | bwd_inner_microstep: 4901.29 | bwd_allreduce_microstep: 410.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3796 [2024-07-31 13:40:42,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.14 | bwd_microstep: 5078.73 | bwd_inner_microstep: 5038.81 | bwd_allreduce_microstep: 39.84 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3705 [2024-07-31 13:40:51,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.44 | bwd_microstep: 4871.57 | bwd_inner_microstep: 4844.91 | bwd_allreduce_microstep: 26.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 13:41:00,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.62 | bwd_microstep: 4982.95 | bwd_inner_microstep: 4963.58 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 13:41:08,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.99 | bwd_microstep: 5120.75 | bwd_inner_microstep: 5055.02 | bwd_allreduce_microstep: 65.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 13:41:17,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.63 | bwd_microstep: 5176.08 | bwd_inner_microstep: 5100.73 | bwd_allreduce_microstep: 75.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 13:41:26,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 13:41:26,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.05 | bwd_microstep: 5028.91 | bwd_inner_microstep: 4967.76 | bwd_allreduce_microstep: 61.08 | step_microstep: 181.90 [2024-07-31 13:41:26,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29072.00 | bwd: 40829.11 | bwd_inner: 40111.42 | bwd_allreduce: 717.19 | step: 182.50 48%|████▊ | 586/1230 [11:29:32<12:32:23, 70.10s/it] {'loss': 1.2088, 'learning_rate': 1.1247584426920962e-05, 'epoch': 0.48} 48%|████▊ | 586/1230 [11:29:32<12:32:23, 70.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3539 [2024-07-31 13:41:35,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.97 | bwd_microstep: 5436.09 | bwd_inner_microstep: 5265.91 | bwd_allreduce_microstep: 170.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-31 13:41:44,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.48 | bwd_microstep: 5425.13 | bwd_inner_microstep: 5005.32 | bwd_allreduce_microstep: 419.74 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3932 [2024-07-31 13:41:52,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3285.91 | bwd_microstep: 4968.65 | bwd_inner_microstep: 4949.31 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 13:42:01,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.50 | bwd_microstep: 5109.85 | bwd_inner_microstep: 5058.15 | bwd_allreduce_microstep: 51.63 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3830 [2024-07-31 13:42:10,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.42 | bwd_microstep: 5093.09 | bwd_inner_microstep: 5036.45 | bwd_allreduce_microstep: 56.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-31 13:42:18,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3228.34 | bwd_microstep: 4847.86 | bwd_inner_microstep: 4820.02 | bwd_allreduce_microstep: 27.77 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 13:42:27,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.50 | bwd_microstep: 5086.71 | bwd_inner_microstep: 4690.64 | bwd_allreduce_microstep: 396.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 13:42:35,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 13:42:35,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.39 | bwd_microstep: 4987.54 | bwd_inner_microstep: 4935.06 | bwd_allreduce_microstep: 52.41 | step_microstep: 182.37 [2024-07-31 13:42:35,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28199.41 | bwd: 40954.90 | bwd_inner: 39760.80 | bwd_allreduce: 1193.61 | step: 183.07 48%|████▊ | 587/1230 [11:30:41<12:29:15, 69.91s/it] {'loss': 1.127, 'learning_rate': 1.1221452320448447e-05, 'epoch': 0.48} 48%|████▊ | 587/1230 [11:30:41<12:29:15, 69.91s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-31 13:42:44,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3287.64 | bwd_microstep: 4967.74 | bwd_inner_microstep: 4931.68 | bwd_allreduce_microstep: 35.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2259 [2024-07-31 13:42:53,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.75 | bwd_microstep: 5407.26 | bwd_inner_microstep: 4988.18 | bwd_allreduce_microstep: 419.01 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3597 [2024-07-31 13:43:02,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.68 | bwd_microstep: 5190.08 | bwd_inner_microstep: 5092.58 | bwd_allreduce_microstep: 97.43 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2106 [2024-07-31 13:43:10,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.90 | bwd_microstep: 5174.03 | bwd_inner_microstep: 4772.44 | bwd_allreduce_microstep: 401.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 13:43:19,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.04 | bwd_microstep: 4887.49 | bwd_inner_microstep: 4868.16 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 13:43:28,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.91 | bwd_microstep: 5085.96 | bwd_inner_microstep: 5021.83 | bwd_allreduce_microstep: 64.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 13:43:36,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.72 | bwd_microstep: 5052.72 | bwd_inner_microstep: 5011.50 | bwd_allreduce_microstep: 41.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3665 [2024-07-31 13:43:45,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 13:43:45,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.22 | bwd_microstep: 5073.89 | bwd_inner_microstep: 4992.50 | bwd_allreduce_microstep: 81.32 | step_microstep: 181.41 [2024-07-31 13:43:45,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28636.76 | bwd: 40839.15 | bwd_inner: 39678.82 | bwd_allreduce: 1159.85 | step: 182.00 48%|████▊ | 588/1230 [11:31:51<12:27:43, 69.88s/it] {'loss': 1.2277, 'learning_rate': 1.1195311743747445e-05, 'epoch': 0.48} 48%|████▊ | 588/1230 [11:31:51<12:27:43, 69.88s/it]dynamic ViT batch size: 13, images per sample: 6.5, dynamic token length: 2774 [2024-07-31 13:43:54,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.06 | bwd_microstep: 5133.27 | bwd_inner_microstep: 4741.56 | bwd_allreduce_microstep: 391.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2342 [2024-07-31 13:44:02,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.85 | bwd_microstep: 5208.02 | bwd_inner_microstep: 4804.06 | bwd_allreduce_microstep: 403.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-31 13:44:11,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.61 | bwd_microstep: 5158.92 | bwd_inner_microstep: 5104.24 | bwd_allreduce_microstep: 54.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 13:44:20,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.99 | bwd_microstep: 5132.60 | bwd_inner_microstep: 5055.29 | bwd_allreduce_microstep: 77.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-31 13:44:29,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.32 | bwd_microstep: 5138.57 | bwd_inner_microstep: 5086.32 | bwd_allreduce_microstep: 52.19 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2166 [2024-07-31 13:44:37,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3028.19 | bwd_microstep: 4905.48 | bwd_inner_microstep: 4530.19 | bwd_allreduce_microstep: 375.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 13:44:45,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.09 | bwd_microstep: 5030.54 | bwd_inner_microstep: 4970.47 | bwd_allreduce_microstep: 60.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 13:44:54,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 13:44:54,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.28 | bwd_microstep: 4960.89 | bwd_inner_microstep: 4928.61 | bwd_allreduce_microstep: 32.20 | step_microstep: 181.87 [2024-07-31 13:44:54,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27825.30 | bwd: 40668.28 | bwd_inner: 39220.68 | bwd_allreduce: 1447.13 | step: 182.45 48%|████▊ | 589/1230 [11:33:00<12:23:10, 69.56s/it] {'loss': 1.2246, 'learning_rate': 1.116916287809122e-05, 'epoch': 0.48} 48%|████▊ | 589/1230 [11:33:00<12:23:10, 69.56s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3936 [2024-07-31 13:45:03,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.50 | bwd_microstep: 5598.80 | bwd_inner_microstep: 5503.80 | bwd_allreduce_microstep: 94.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3952 [2024-07-31 13:45:12,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3796.37 | bwd_microstep: 5195.76 | bwd_inner_microstep: 5176.48 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-07-31 13:45:21,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.14 | bwd_microstep: 5126.75 | bwd_inner_microstep: 5084.67 | bwd_allreduce_microstep: 42.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3634 [2024-07-31 13:45:30,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.96 | bwd_microstep: 5245.39 | bwd_inner_microstep: 5145.33 | bwd_allreduce_microstep: 99.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 13:45:39,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.61 | bwd_microstep: 5133.72 | bwd_inner_microstep: 5081.57 | bwd_allreduce_microstep: 52.08 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2127 [2024-07-31 13:45:48,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.47 | bwd_microstep: 5220.51 | bwd_inner_microstep: 4815.13 | bwd_allreduce_microstep: 405.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3674 [2024-07-31 13:45:56,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3692.69 | bwd_microstep: 4871.11 | bwd_inner_microstep: 4851.82 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3658 [2024-07-31 13:46:05,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.80 [2024-07-31 13:46:05,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.35 | bwd_microstep: 4917.64 | bwd_inner_microstep: 4890.78 | bwd_allreduce_microstep: 26.79 | step_microstep: 182.15 [2024-07-31 13:46:05,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29339.00 | bwd: 41309.66 | bwd_inner: 40549.52 | bwd_allreduce: 759.66 | step: 182.74 48%|████▊ | 590/1230 [11:34:11<12:26:32, 69.99s/it] {'loss': 1.1174, 'learning_rate': 1.1143005904810527e-05, 'epoch': 0.48} 48%|████▊ | 590/1230 [11:34:11<12:26:32, 69.99s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3887 [2024-07-31 13:46:14,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3836.87 | bwd_microstep: 5286.52 | bwd_inner_microstep: 5249.58 | bwd_allreduce_microstep: 36.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-31 13:46:23,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.85 | bwd_microstep: 5440.59 | bwd_inner_microstep: 5021.18 | bwd_allreduce_microstep: 419.34 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-31 13:46:32,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.02 | bwd_microstep: 5172.37 | bwd_inner_microstep: 4770.23 | bwd_allreduce_microstep: 402.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 13:46:41,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.90 | bwd_microstep: 5188.77 | bwd_inner_microstep: 5108.19 | bwd_allreduce_microstep: 80.51 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2170 [2024-07-31 13:46:50,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.66 | bwd_microstep: 5185.68 | bwd_inner_microstep: 4782.27 | bwd_allreduce_microstep: 403.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 13:46:58,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.74 | bwd_microstep: 4985.85 | bwd_inner_microstep: 4966.43 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 13:47:07,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.25 | bwd_microstep: 4897.56 | bwd_inner_microstep: 4878.16 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.19 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3162 [2024-07-31 13:47:16,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 13:47:16,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.73 | bwd_microstep: 5132.60 | bwd_inner_microstep: 4881.42 | bwd_allreduce_microstep: 251.10 | step_microstep: 181.37 [2024-07-31 13:47:16,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29140.93 | bwd: 41289.93 | bwd_inner: 39657.42 | bwd_allreduce: 1632.02 | step: 182.08 48%|████▊ | 591/1230 [11:35:22<12:27:50, 70.22s/it] {'loss': 1.1491, 'learning_rate': 1.1116841005292335e-05, 'epoch': 0.48} 48%|████▊ | 591/1230 [11:35:22<12:27:50, 70.22s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4057 [2024-07-31 13:47:25,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3841.27 | bwd_microstep: 5314.94 | bwd_inner_microstep: 5295.85 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3073 [2024-07-31 13:47:33,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3048.96 | bwd_microstep: 4835.29 | bwd_inner_microstep: 4708.91 | bwd_allreduce_microstep: 126.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 13:47:42,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.52 | bwd_microstep: 5377.86 | bwd_inner_microstep: 4963.95 | bwd_allreduce_microstep: 413.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-31 13:47:51,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.34 | bwd_microstep: 5222.22 | bwd_inner_microstep: 5129.59 | bwd_allreduce_microstep: 92.56 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 13:47:59,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.01 | bwd_microstep: 4993.65 | bwd_inner_microstep: 4959.67 | bwd_allreduce_microstep: 33.90 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 13:48:08,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.92 | bwd_microstep: 4991.05 | bwd_inner_microstep: 4967.92 | bwd_allreduce_microstep: 23.06 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2113 [2024-07-31 13:48:16,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.62 | bwd_microstep: 4912.66 | bwd_inner_microstep: 4534.28 | bwd_allreduce_microstep: 378.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-31 13:48:25,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 13:48:25,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.00 | bwd_microstep: 5037.87 | bwd_inner_microstep: 4971.57 | bwd_allreduce_microstep: 66.23 | step_microstep: 181.29 [2024-07-31 13:48:25,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28388.53 | bwd: 40685.50 | bwd_inner: 39531.69 | bwd_allreduce: 1153.33 | step: 181.89 48%|████▊ | 592/1230 [11:36:31<12:24:05, 69.98s/it] {'loss': 1.1252, 'learning_rate': 1.1090668360978587e-05, 'epoch': 0.48} 48%|████▊ | 592/1230 [11:36:31<12:24:05, 69.98s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 13:48:34,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3888.48 | bwd_microstep: 5383.57 | bwd_inner_microstep: 5364.55 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3834 [2024-07-31 13:48:43,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.75 | bwd_microstep: 5146.70 | bwd_inner_microstep: 5101.31 | bwd_allreduce_microstep: 45.33 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3717 [2024-07-31 13:48:52,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.52 | bwd_microstep: 5095.68 | bwd_inner_microstep: 5055.88 | bwd_allreduce_microstep: 39.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3794 [2024-07-31 13:49:01,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.79 | bwd_microstep: 5064.54 | bwd_inner_microstep: 5043.48 | bwd_allreduce_microstep: 21.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3736 [2024-07-31 13:49:10,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.61 | bwd_microstep: 5091.78 | bwd_inner_microstep: 5048.00 | bwd_allreduce_microstep: 43.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3719 [2024-07-31 13:49:18,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.52 | bwd_microstep: 5102.40 | bwd_inner_microstep: 5032.31 | bwd_allreduce_microstep: 70.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3638 [2024-07-31 13:49:27,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.23 | bwd_microstep: 5114.50 | bwd_inner_microstep: 5017.66 | bwd_allreduce_microstep: 96.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-31 13:49:36,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 13:49:36,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.72 | bwd_microstep: 5003.06 | bwd_inner_microstep: 4965.94 | bwd_allreduce_microstep: 37.06 | step_microstep: 181.89 [2024-07-31 13:49:36,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29183.52 | bwd: 41002.21 | bwd_inner: 40629.06 | bwd_allreduce: 372.67 | step: 182.47 48%|████▊ | 593/1230 [11:37:42<12:24:37, 70.14s/it] {'loss': 1.2098, 'learning_rate': 1.106448815336493e-05, 'epoch': 0.48} 48%|████▊ | 593/1230 [11:37:42<12:24:37, 70.14s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3888 [2024-07-31 13:49:45,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.41 | bwd_microstep: 5549.14 | bwd_inner_microstep: 5442.64 | bwd_allreduce_microstep: 106.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 13:49:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.17 | bwd_microstep: 5255.98 | bwd_inner_microstep: 5161.07 | bwd_allreduce_microstep: 94.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3813 [2024-07-31 13:50:03,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.59 | bwd_microstep: 5236.57 | bwd_inner_microstep: 5182.50 | bwd_allreduce_microstep: 54.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-31 13:50:11,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3233.20 | bwd_microstep: 4874.41 | bwd_inner_microstep: 4825.94 | bwd_allreduce_microstep: 48.40 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 13:50:19,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.31 | bwd_microstep: 4737.39 | bwd_inner_microstep: 4709.65 | bwd_allreduce_microstep: 27.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 13:50:28,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.39 | bwd_microstep: 5112.17 | bwd_inner_microstep: 5037.74 | bwd_allreduce_microstep: 74.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3651 [2024-07-31 13:50:36,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.16 | bwd_microstep: 5070.40 | bwd_inner_microstep: 4983.72 | bwd_allreduce_microstep: 86.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 13:50:45,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 13:50:45,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.85 | bwd_microstep: 5218.50 | bwd_inner_microstep: 4813.18 | bwd_allreduce_microstep: 405.26 | step_microstep: 182.62 [2024-07-31 13:50:45,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28174.99 | bwd: 41054.55 | bwd_inner: 40156.37 | bwd_allreduce: 897.69 | step: 183.21 48%|████▊ | 594/1230 [11:38:51<12:21:37, 69.96s/it] {'loss': 1.1624, 'learning_rate': 1.1038300563999453e-05, 'epoch': 0.48} 48%|████▊ | 594/1230 [11:38:51<12:21:37, 69.96s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3954 [2024-07-31 13:50:54,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3811.88 | bwd_microstep: 5188.07 | bwd_inner_microstep: 5168.96 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3839 [2024-07-31 13:51:03,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.20 | bwd_microstep: 5225.95 | bwd_inner_microstep: 5171.84 | bwd_allreduce_microstep: 54.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-31 13:51:12,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.34 | bwd_microstep: 4984.38 | bwd_inner_microstep: 4964.49 | bwd_allreduce_microstep: 19.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3780 [2024-07-31 13:51:21,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.09 | bwd_microstep: 5134.78 | bwd_inner_microstep: 5065.62 | bwd_allreduce_microstep: 69.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 13:51:29,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.96 | bwd_microstep: 5226.94 | bwd_inner_microstep: 4821.02 | bwd_allreduce_microstep: 405.85 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 13:51:38,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.34 | bwd_microstep: 5149.25 | bwd_inner_microstep: 5075.27 | bwd_allreduce_microstep: 73.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 13:51:47,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.91 | bwd_microstep: 5071.97 | bwd_inner_microstep: 5010.22 | bwd_allreduce_microstep: 61.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-31 13:51:56,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.81 [2024-07-31 13:51:56,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.77 | bwd_microstep: 4934.83 | bwd_inner_microstep: 4908.36 | bwd_allreduce_microstep: 26.40 | step_microstep: 182.03 [2024-07-31 13:51:56,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29293.39 | bwd: 40916.15 | bwd_inner: 40185.72 | bwd_allreduce: 729.92 | step: 182.62 48%|████▊ | 595/1230 [11:40:02<12:22:17, 70.14s/it] {'loss': 1.1954, 'learning_rate': 1.1012105774481446e-05, 'epoch': 0.48} 48%|████▊ | 595/1230 [11:40:02<12:22:17, 70.14s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3867 [2024-07-31 13:52:05,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.08 | bwd_microstep: 5264.17 | bwd_inner_microstep: 5208.74 | bwd_allreduce_microstep: 55.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3879 [2024-07-31 13:52:14,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.42 | bwd_microstep: 5300.15 | bwd_inner_microstep: 5240.42 | bwd_allreduce_microstep: 59.66 | step_microstep: 0.19 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3772 [2024-07-31 13:52:23,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.00 | bwd_microstep: 5209.50 | bwd_inner_microstep: 5134.19 | bwd_allreduce_microstep: 75.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3773 [2024-07-31 13:52:31,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.76 | bwd_microstep: 5202.86 | bwd_inner_microstep: 5140.97 | bwd_allreduce_microstep: 61.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-31 13:52:40,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.91 | bwd_microstep: 5107.29 | bwd_inner_microstep: 5061.05 | bwd_allreduce_microstep: 46.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-31 13:52:49,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.26 | bwd_microstep: 5121.86 | bwd_inner_microstep: 5050.35 | bwd_allreduce_microstep: 71.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 13:52:57,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3007.45 | bwd_microstep: 4890.71 | bwd_inner_microstep: 4514.76 | bwd_allreduce_microstep: 375.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2134 [2024-07-31 13:53:06,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 13:53:06,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.12 | bwd_microstep: 5111.03 | bwd_inner_microstep: 4713.97 | bwd_allreduce_microstep: 396.99 | step_microstep: 183.65 [2024-07-31 13:53:06,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28302.89 | bwd: 41207.55 | bwd_inner: 40064.37 | bwd_allreduce: 1142.69 | step: 184.34 48%|████▊ | 596/1230 [11:41:12<12:20:10, 70.05s/it] {'loss': 1.1636, 'learning_rate': 1.0985903966460115e-05, 'epoch': 0.48} 48%|████▊ | 596/1230 [11:41:12<12:20:10, 70.05s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 13:53:15,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.40 | bwd_microstep: 5548.90 | bwd_inner_microstep: 5397.37 | bwd_allreduce_microstep: 151.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3820 [2024-07-31 13:53:23,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3244.79 | bwd_microstep: 4869.28 | bwd_inner_microstep: 4846.20 | bwd_allreduce_microstep: 23.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-31 13:53:32,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.72 | bwd_microstep: 5114.73 | bwd_inner_microstep: 5038.14 | bwd_allreduce_microstep: 76.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-31 13:53:41,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.55 | bwd_microstep: 5166.90 | bwd_inner_microstep: 5091.51 | bwd_allreduce_microstep: 75.33 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3765 [2024-07-31 13:53:49,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.74 | bwd_microstep: 5011.04 | bwd_inner_microstep: 4991.70 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1097 [2024-07-31 13:53:58,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.75 | bwd_microstep: 5152.34 | bwd_inner_microstep: 4754.97 | bwd_allreduce_microstep: 397.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3670 [2024-07-31 13:54:07,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.30 | bwd_microstep: 4892.03 | bwd_inner_microstep: 4872.68 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 13:54:16,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 13:54:16,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.07 | bwd_microstep: 5250.29 | bwd_inner_microstep: 4842.47 | bwd_allreduce_microstep: 407.75 | step_microstep: 181.67 [2024-07-31 13:54:16,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28565.22 | bwd: 41005.49 | bwd_inner: 39834.98 | bwd_allreduce: 1170.02 | step: 182.25 49%|████▊ | 597/1230 [11:42:21<12:18:32, 70.00s/it] {'loss': 1.1576, 'learning_rate': 1.0959695321633346e-05, 'epoch': 0.49} 49%|████▊ | 597/1230 [11:42:21<12:18:32, 70.00s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2301 [2024-07-31 13:54:24,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.57 | bwd_microstep: 5311.46 | bwd_inner_microstep: 4900.99 | bwd_allreduce_microstep: 410.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2318 [2024-07-31 13:54:33,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.65 | bwd_microstep: 5171.75 | bwd_inner_microstep: 4763.82 | bwd_allreduce_microstep: 407.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 13:54:42,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.75 | bwd_microstep: 5155.18 | bwd_inner_microstep: 5075.96 | bwd_allreduce_microstep: 79.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 13:54:51,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.52 | bwd_microstep: 5126.64 | bwd_inner_microstep: 5062.07 | bwd_allreduce_microstep: 64.50 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3622 [2024-07-31 13:54:59,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.33 | bwd_microstep: 5132.89 | bwd_inner_microstep: 5038.82 | bwd_allreduce_microstep: 94.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 13:55:07,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3003.35 | bwd_microstep: 4867.32 | bwd_inner_microstep: 4491.57 | bwd_allreduce_microstep: 375.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3663 [2024-07-31 13:55:16,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.90 | bwd_microstep: 4918.86 | bwd_inner_microstep: 4892.63 | bwd_allreduce_microstep: 26.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-31 13:55:25,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 13:55:25,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.59 | bwd_microstep: 5163.10 | bwd_inner_microstep: 4759.09 | bwd_allreduce_microstep: 403.95 | step_microstep: 181.13 [2024-07-31 13:55:25,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28131.55 | bwd: 40847.17 | bwd_inner: 38984.88 | bwd_allreduce: 1861.81 | step: 181.71 49%|████▊ | 598/1230 [11:43:31<12:15:10, 69.80s/it] {'loss': 1.1737, 'learning_rate': 1.093348002174643e-05, 'epoch': 0.49} 49%|████▊ | 598/1230 [11:43:31<12:15:10, 69.80s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4065 [2024-07-31 13:55:34,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.68 | bwd_microstep: 5500.45 | bwd_inner_microstep: 5455.51 | bwd_allreduce_microstep: 44.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3975 [2024-07-31 13:55:43,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.25 | bwd_microstep: 5066.64 | bwd_inner_microstep: 5047.22 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2290 [2024-07-31 13:55:51,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.27 | bwd_microstep: 5117.00 | bwd_inner_microstep: 4719.19 | bwd_allreduce_microstep: 397.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-31 13:56:00,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.92 | bwd_microstep: 4996.40 | bwd_inner_microstep: 4975.50 | bwd_allreduce_microstep: 20.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-31 13:56:09,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.74 | bwd_microstep: 5178.42 | bwd_inner_microstep: 4776.55 | bwd_allreduce_microstep: 401.80 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 13:56:17,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3208.83 | bwd_microstep: 4791.34 | bwd_inner_microstep: 4754.27 | bwd_allreduce_microstep: 37.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3662 [2024-07-31 13:56:26,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.20 | bwd_microstep: 5225.17 | bwd_inner_microstep: 5205.77 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 13:56:34,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 13:56:34,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.24 | bwd_microstep: 5039.88 | bwd_inner_microstep: 4986.82 | bwd_allreduce_microstep: 52.99 | step_microstep: 181.52 [2024-07-31 13:56:34,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28340.03 | bwd: 40915.27 | bwd_inner: 39920.78 | bwd_allreduce: 994.00 | step: 182.11 49%|████▊ | 599/1230 [11:44:40<12:13:20, 69.73s/it] {'loss': 1.1534, 'learning_rate': 1.0907258248590816e-05, 'epoch': 0.49} 49%|████▊ | 599/1230 [11:44:40<12:13:20, 69.73s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 13:56:43,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3357.99 | bwd_microstep: 5187.48 | bwd_inner_microstep: 5168.38 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 13:56:52,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.24 | bwd_microstep: 5000.77 | bwd_inner_microstep: 4981.38 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 13:57:01,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.11 | bwd_microstep: 5058.84 | bwd_inner_microstep: 5031.65 | bwd_allreduce_microstep: 27.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 13:57:09,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.85 | bwd_microstep: 5171.57 | bwd_inner_microstep: 5097.55 | bwd_allreduce_microstep: 73.96 | step_microstep: 0.11 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-31 13:57:18,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.71 | bwd_microstep: 5177.57 | bwd_inner_microstep: 4775.47 | bwd_allreduce_microstep: 402.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 13:57:27,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.55 | bwd_microstep: 4982.20 | bwd_inner_microstep: 4962.91 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 13:57:36,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.72 | bwd_microstep: 5001.23 | bwd_inner_microstep: 4963.26 | bwd_allreduce_microstep: 37.90 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 13:57:44,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 13:57:44,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.27 | bwd_microstep: 5090.16 | bwd_inner_microstep: 5027.71 | bwd_allreduce_microstep: 62.38 | step_microstep: 181.55 [2024-07-31 13:57:44,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29043.35 | bwd: 40669.80 | bwd_inner: 40008.25 | bwd_allreduce: 661.06 | step: 182.26 49%|████▉ | 600/1230 [11:45:50<12:13:10, 69.83s/it] {'loss': 1.1986, 'learning_rate': 1.0881030184002827e-05, 'epoch': 0.49} 49%|████▉ | 600/1230 [11:45:50<12:13:10, 69.83s/it][INFO|trainer.py:2936] 2024-07-31 13:58:11,124 >> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600 [INFO|configuration_utils.py:473] 2024-07-31 13:58:11,125 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/config.json [INFO|configuration_utils.py:594] 2024-07-31 13:58:11,126 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/generation_config.json [INFO|modeling_utils.py:2501] 2024-07-31 13:59:03,946 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-07-31 13:59:03,948 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-07-31 13:59:03,948 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-07-31 13:59:03,948 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/added_tokens.json [2024-07-31 13:59:03,989] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step600 is about to be saved! [2024-07-31 13:59:04,706] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/global_step600/zero_pp_rank_0_mp_rank_00_model_states.pt [2024-07-31 13:59:04,707] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/global_step600/zero_pp_rank_0_mp_rank_00_model_states.pt... [2024-07-31 13:59:06,485] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/global_step600/zero_pp_rank_0_mp_rank_00_model_states.pt. [2024-07-31 13:59:06,626] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-07-31 14:00:10,045] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-07-31 14:00:10,045] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-07-31 14:00:10,739] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step600 is ready now! [INFO|trainer.py:3028] 2024-07-31 14:00:10,796 >> Deleting older checkpoint [/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/checkpoint-400] due to args.save_total_limit dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 14:00:48,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.06 | bwd_microstep: 5522.92 | bwd_inner_microstep: 5355.83 | bwd_allreduce_microstep: 167.02 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3828 [2024-07-31 14:00:56,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.62 | bwd_microstep: 5100.68 | bwd_inner_microstep: 5058.47 | bwd_allreduce_microstep: 42.14 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3787 [2024-07-31 14:01:05,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.28 | bwd_microstep: 5154.93 | bwd_inner_microstep: 5088.94 | bwd_allreduce_microstep: 65.92 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3103 [2024-07-31 14:01:14,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.43 | bwd_microstep: 5071.68 | bwd_inner_microstep: 4812.36 | bwd_allreduce_microstep: 259.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 14:01:22,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.26 | bwd_microstep: 4963.74 | bwd_inner_microstep: 4944.41 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 14:01:31,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.07 | bwd_microstep: 4886.43 | bwd_inner_microstep: 4863.29 | bwd_allreduce_microstep: 23.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 14:01:40,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.37 | bwd_microstep: 5163.54 | bwd_inner_microstep: 5087.32 | bwd_allreduce_microstep: 76.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 14:01:49,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 14:01:49,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.07 | bwd_microstep: 4886.88 | bwd_inner_microstep: 4867.38 | bwd_allreduce_microstep: 19.43 | step_microstep: 181.42 [2024-07-31 14:01:49,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29111.07 | bwd: 40750.77 | bwd_inner: 40077.95 | bwd_allreduce: 672.34 | step: 182.01 49%|████▉ | 601/1230 [11:49:54<21:19:59, 122.10s/it] {'loss': 1.1867, 'learning_rate': 1.0854796009862434e-05, 'epoch': 0.49} 49%|████▉ | 601/1230 [11:49:54<21:19:59, 122.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3910 [2024-07-31 14:01:57,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.15 | bwd_microstep: 5149.21 | bwd_inner_microstep: 5114.12 | bwd_allreduce_microstep: 35.02 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3783 [2024-07-31 14:02:06,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.82 | bwd_microstep: 5016.85 | bwd_inner_microstep: 4994.02 | bwd_allreduce_microstep: 22.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2255 [2024-07-31 14:02:15,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.01 | bwd_microstep: 5090.81 | bwd_inner_microstep: 4697.03 | bwd_allreduce_microstep: 393.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-07-31 14:02:24,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3780.16 | bwd_microstep: 5070.66 | bwd_inner_microstep: 5042.66 | bwd_allreduce_microstep: 27.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 14:02:32,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.06 | bwd_microstep: 5170.04 | bwd_inner_microstep: 4767.78 | bwd_allreduce_microstep: 402.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 14:02:41,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.82 | bwd_microstep: 5001.54 | bwd_inner_microstep: 4979.77 | bwd_allreduce_microstep: 21.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-31 14:02:50,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.87 | bwd_microstep: 5042.32 | bwd_inner_microstep: 5002.30 | bwd_allreduce_microstep: 39.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-07-31 14:02:58,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-31 14:02:58,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.56 | bwd_microstep: 5023.05 | bwd_inner_microstep: 4991.55 | bwd_allreduce_microstep: 31.43 | step_microstep: 181.62 [2024-07-31 14:02:58,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29048.34 | bwd: 40564.47 | bwd_inner: 39589.16 | bwd_allreduce: 974.81 | step: 182.21 49%|████▉ | 602/1230 [11:51:04<18:34:11, 106.45s/it] {'loss': 1.2107, 'learning_rate': 1.0828555908091958e-05, 'epoch': 0.49} 49%|████▉ | 602/1230 [11:51:04<18:34:11, 106.45s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-31 14:03:08,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.58 | bwd_microstep: 5379.87 | bwd_inner_microstep: 5266.60 | bwd_allreduce_microstep: 113.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2319 [2024-07-31 14:03:16,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.29 | bwd_microstep: 5246.50 | bwd_inner_microstep: 4839.29 | bwd_allreduce_microstep: 407.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3756 [2024-07-31 14:03:25,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.39 | bwd_microstep: 5118.84 | bwd_inner_microstep: 5073.59 | bwd_allreduce_microstep: 45.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-31 14:03:34,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.15 | bwd_microstep: 5159.17 | bwd_inner_microstep: 4759.09 | bwd_allreduce_microstep: 400.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-31 14:03:43,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.97 | bwd_microstep: 5029.74 | bwd_inner_microstep: 5004.81 | bwd_allreduce_microstep: 24.86 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-07-31 14:03:51,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.06 | bwd_microstep: 5104.03 | bwd_inner_microstep: 4707.17 | bwd_allreduce_microstep: 396.80 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3697 [2024-07-31 14:04:00,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3423.68 | bwd_microstep: 4780.96 | bwd_inner_microstep: 4761.54 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3673 [2024-07-31 14:04:08,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 14:04:08,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.78 | bwd_microstep: 5062.79 | bwd_inner_microstep: 4992.10 | bwd_allreduce_microstep: 70.62 | step_microstep: 182.15 [2024-07-31 14:04:08,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28658.80 | bwd: 40881.89 | bwd_inner: 39404.14 | bwd_allreduce: 1477.26 | step: 182.73 49%|████▉ | 603/1230 [11:52:14<16:37:44, 95.48s/it] {'loss': 1.2043, 'learning_rate': 1.080231006065483e-05, 'epoch': 0.49} 49%|████▉ | 603/1230 [11:52:14<16:37:44, 95.48s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3909 [2024-07-31 14:04:18,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.25 | bwd_microstep: 5511.06 | bwd_inner_microstep: 5442.56 | bwd_allreduce_microstep: 68.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 14:04:26,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.94 | bwd_microstep: 5187.68 | bwd_inner_microstep: 5112.54 | bwd_allreduce_microstep: 75.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3822 [2024-07-31 14:04:35,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.93 | bwd_microstep: 5036.69 | bwd_inner_microstep: 5017.35 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 14:04:44,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.85 | bwd_microstep: 5309.92 | bwd_inner_microstep: 5213.95 | bwd_allreduce_microstep: 95.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 14:04:52,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3234.55 | bwd_microstep: 4851.38 | bwd_inner_microstep: 4804.08 | bwd_allreduce_microstep: 47.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-07-31 14:05:01,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.41 | bwd_microstep: 4997.83 | bwd_inner_microstep: 4964.45 | bwd_allreduce_microstep: 33.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 14:05:10,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.87 | bwd_microstep: 5105.18 | bwd_inner_microstep: 5059.09 | bwd_allreduce_microstep: 46.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 14:05:19,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 14:05:19,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.58 | bwd_microstep: 5157.60 | bwd_inner_microstep: 5082.50 | bwd_allreduce_microstep: 75.02 | step_microstep: 181.98 [2024-07-31 14:05:19,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28746.28 | bwd: 41157.33 | bwd_inner: 40696.46 | bwd_allreduce: 460.37 | step: 182.59 49%|████▉ | 604/1230 [11:53:24<15:17:09, 87.91s/it] {'loss': 1.1512, 'learning_rate': 1.0776058649554336e-05, 'epoch': 0.49} 49%|████▉ | 604/1230 [11:53:24<15:17:09, 87.91s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4091 [2024-07-31 14:05:28,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.31 | bwd_microstep: 5263.78 | bwd_inner_microstep: 5237.00 | bwd_allreduce_microstep: 26.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2270 [2024-07-31 14:05:36,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3060.59 | bwd_microstep: 5036.40 | bwd_inner_microstep: 4647.84 | bwd_allreduce_microstep: 388.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2222 [2024-07-31 14:05:44,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.77 | bwd_microstep: 5181.39 | bwd_inner_microstep: 4777.09 | bwd_allreduce_microstep: 404.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 14:05:53,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.02 | bwd_microstep: 5316.06 | bwd_inner_microstep: 5209.70 | bwd_allreduce_microstep: 106.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 14:06:02,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.85 | bwd_microstep: 4974.20 | bwd_inner_microstep: 4954.84 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 14:06:11,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.05 | bwd_microstep: 4974.52 | bwd_inner_microstep: 4955.26 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 14:06:20,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.34 | bwd_microstep: 5047.02 | bwd_inner_microstep: 4983.47 | bwd_allreduce_microstep: 63.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3680 [2024-07-31 14:06:29,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 14:06:29,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.53 | bwd_microstep: 4931.56 | bwd_inner_microstep: 4903.14 | bwd_allreduce_microstep: 28.36 | step_microstep: 411.35 [2024-07-31 14:06:29,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28761.36 | bwd: 40724.91 | bwd_inner: 39668.28 | bwd_allreduce: 1056.14 | step: 411.92 49%|████▉ | 605/1230 [11:54:35<14:19:53, 82.55s/it] {'loss': 1.1073, 'learning_rate': 1.0749801856832325e-05, 'epoch': 0.49} 49%|████▉ | 605/1230 [11:54:35<14:19:53, 82.55s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2450 [2024-07-31 14:06:38,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.78 | bwd_microstep: 5275.93 | bwd_inner_microstep: 4870.78 | bwd_allreduce_microstep: 405.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3821 [2024-07-31 14:06:46,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.35 | bwd_microstep: 5038.15 | bwd_inner_microstep: 5018.77 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1763 [2024-07-31 14:06:55,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.99 | bwd_microstep: 5247.65 | bwd_inner_microstep: 4839.44 | bwd_allreduce_microstep: 408.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-31 14:07:04,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.01 | bwd_microstep: 5008.13 | bwd_inner_microstep: 4988.80 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-31 14:07:13,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.23 | bwd_microstep: 5121.14 | bwd_inner_microstep: 4723.95 | bwd_allreduce_microstep: 397.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 14:07:21,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.96 | bwd_microstep: 5252.02 | bwd_inner_microstep: 4846.11 | bwd_allreduce_microstep: 405.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3659 [2024-07-31 14:07:29,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3081.09 | bwd_microstep: 4875.48 | bwd_inner_microstep: 4827.78 | bwd_allreduce_microstep: 47.63 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2160 [2024-07-31 14:07:38,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 14:07:38,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3339.98 | bwd_microstep: 5545.73 | bwd_inner_microstep: 4956.87 | bwd_allreduce_microstep: 588.78 | step_microstep: 181.11 [2024-07-31 14:07:38,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28047.29 | bwd: 41364.20 | bwd_inner: 39072.44 | bwd_allreduce: 2291.26 | step: 181.70 49%|████▉ | 606/1230 [11:55:44<13:38:36, 78.71s/it] {'loss': 1.1603, 'learning_rate': 1.0723539864567983e-05, 'epoch': 0.49} 49%|████▉ | 606/1230 [11:55:44<13:38:36, 78.71s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2064 [2024-07-31 14:07:48,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.60 | bwd_microstep: 5659.93 | bwd_inner_microstep: 5226.25 | bwd_allreduce_microstep: 433.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3575 [2024-07-31 14:07:56,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.05 | bwd_microstep: 4874.57 | bwd_inner_microstep: 4816.46 | bwd_allreduce_microstep: 58.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-31 14:08:05,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.25 | bwd_microstep: 5009.33 | bwd_inner_microstep: 4989.66 | bwd_allreduce_microstep: 19.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2234 [2024-07-31 14:08:13,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.54 | bwd_microstep: 5139.53 | bwd_inner_microstep: 4740.84 | bwd_allreduce_microstep: 398.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 14:08:21,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3227.31 | bwd_microstep: 4804.07 | bwd_inner_microstep: 4784.72 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 14:08:30,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.41 | bwd_microstep: 5159.45 | bwd_inner_microstep: 5103.60 | bwd_allreduce_microstep: 55.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 14:08:38,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.32 | bwd_microstep: 4733.08 | bwd_inner_microstep: 4707.82 | bwd_allreduce_microstep: 25.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 14:08:47,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 14:08:47,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.67 | bwd_microstep: 5047.81 | bwd_inner_microstep: 4993.74 | bwd_allreduce_microstep: 54.00 | step_microstep: 181.58 [2024-07-31 14:08:47,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27721.05 | bwd: 40427.73 | bwd_inner: 39363.02 | bwd_allreduce: 1064.18 | step: 182.16 49%|████▉ | 607/1230 [11:56:53<13:05:24, 75.64s/it] {'loss': 1.1944, 'learning_rate': 1.0697272854876535e-05, 'epoch': 0.49} 49%|████▉ | 607/1230 [11:56:53<13:05:24, 75.64s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3885 [2024-07-31 14:08:56,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3831.43 | bwd_microstep: 5221.13 | bwd_inner_microstep: 5187.63 | bwd_allreduce_microstep: 33.43 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2806 [2024-07-31 14:09:05,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.79 | bwd_microstep: 5266.19 | bwd_inner_microstep: 4856.67 | bwd_allreduce_microstep: 409.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 14:09:14,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.88 | bwd_microstep: 5147.14 | bwd_inner_microstep: 5072.48 | bwd_allreduce_microstep: 74.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3606 [2024-07-31 14:09:22,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.43 | bwd_microstep: 5130.58 | bwd_inner_microstep: 5055.82 | bwd_allreduce_microstep: 74.69 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2175 [2024-07-31 14:09:31,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.34 | bwd_microstep: 5123.90 | bwd_inner_microstep: 4728.18 | bwd_allreduce_microstep: 395.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 14:09:40,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.92 | bwd_microstep: 5017.09 | bwd_inner_microstep: 4962.58 | bwd_allreduce_microstep: 54.44 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1113 [2024-07-31 14:09:48,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3478.00 | bwd_microstep: 5127.26 | bwd_inner_microstep: 4734.12 | bwd_allreduce_microstep: 393.07 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2910 [2024-07-31 14:09:56,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 14:09:56,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3057.25 | bwd_microstep: 4888.75 | bwd_inner_microstep: 4621.62 | bwd_allreduce_microstep: 267.07 | step_microstep: 181.45 [2024-07-31 14:09:56,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28204.94 | bwd: 40922.03 | bwd_inner: 39219.04 | bwd_allreduce: 1702.50 | step: 182.03 49%|████▉ | 608/1230 [11:58:02<12:44:54, 73.79s/it] {'loss': 1.1842, 'learning_rate': 1.0671001009908015e-05, 'epoch': 0.49} 49%|████▉ | 608/1230 [11:58:02<12:44:54, 73.79s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3886 [2024-07-31 14:10:06,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3837.75 | bwd_microstep: 5349.30 | bwd_inner_microstep: 5298.26 | bwd_allreduce_microstep: 50.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3801 [2024-07-31 14:10:14,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.27 | bwd_microstep: 5269.44 | bwd_inner_microstep: 5205.78 | bwd_allreduce_microstep: 63.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-31 14:10:23,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.46 | bwd_microstep: 5103.14 | bwd_inner_microstep: 5036.90 | bwd_allreduce_microstep: 66.17 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2076 [2024-07-31 14:10:32,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.55 | bwd_microstep: 5233.85 | bwd_inner_microstep: 4827.98 | bwd_allreduce_microstep: 405.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 14:10:40,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3035.03 | bwd_microstep: 4994.78 | bwd_inner_microstep: 4610.71 | bwd_allreduce_microstep: 384.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 14:10:49,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.94 | bwd_microstep: 4966.98 | bwd_inner_microstep: 4947.60 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-31 14:10:57,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.10 | bwd_microstep: 5160.57 | bwd_inner_microstep: 4758.88 | bwd_allreduce_microstep: 401.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 14:11:06,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 14:11:06,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3692.67 | bwd_microstep: 4890.67 | bwd_inner_microstep: 4871.36 | bwd_allreduce_microstep: 19.24 | step_microstep: 181.31 [2024-07-31 14:11:06,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28566.69 | bwd: 40968.73 | bwd_inner: 39557.42 | bwd_allreduce: 1410.83 | step: 181.91 50%|████▉ | 609/1230 [11:59:12<12:31:30, 72.61s/it] {'loss': 1.1428, 'learning_rate': 1.0644724511845976e-05, 'epoch': 0.5} 50%|████▉ | 609/1230 [11:59:12<12:31:30, 72.61s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3907 [2024-07-31 14:11:15,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.17 | bwd_microstep: 5118.20 | bwd_inner_microstep: 5084.30 | bwd_allreduce_microstep: 33.84 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3809 [2024-07-31 14:11:23,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.79 | bwd_microstep: 4925.54 | bwd_inner_microstep: 4906.14 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3739 [2024-07-31 14:11:31,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3136.08 | bwd_microstep: 4856.30 | bwd_inner_microstep: 4824.77 | bwd_allreduce_microstep: 31.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-07-31 14:11:40,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.86 | bwd_microstep: 5025.91 | bwd_inner_microstep: 5006.60 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-31 14:11:49,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.39 | bwd_microstep: 5228.83 | bwd_inner_microstep: 5143.11 | bwd_allreduce_microstep: 85.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-31 14:11:58,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.03 | bwd_microstep: 4949.65 | bwd_inner_microstep: 4915.38 | bwd_allreduce_microstep: 34.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 14:12:06,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.46 | bwd_microstep: 5069.10 | bwd_inner_microstep: 4675.86 | bwd_allreduce_microstep: 393.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-31 14:12:15,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-31 14:12:15,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.31 | bwd_microstep: 4887.77 | bwd_inner_microstep: 4838.59 | bwd_allreduce_microstep: 49.11 | step_microstep: 182.98 [2024-07-31 14:12:15,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28427.00 | bwd: 40061.28 | bwd_inner: 39394.69 | bwd_allreduce: 666.11 | step: 183.56 50%|████▉ | 610/1230 [12:00:21<12:18:34, 71.48s/it] {'loss': 1.1471, 'learning_rate': 1.0618443542906251e-05, 'epoch': 0.5} 50%|████▉ | 610/1230 [12:00:21<12:18:34, 71.48s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4043 [2024-07-31 14:12:24,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3851.72 | bwd_microstep: 5388.12 | bwd_inner_microstep: 5369.03 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3907 [2024-07-31 14:12:33,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.73 | bwd_microstep: 5082.92 | bwd_inner_microstep: 5063.61 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3843 [2024-07-31 14:12:42,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3800.01 | bwd_microstep: 5110.59 | bwd_inner_microstep: 5091.25 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-07-31 14:12:50,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3436.94 | bwd_microstep: 4893.19 | bwd_inner_microstep: 4873.90 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3846 [2024-07-31 14:12:59,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3799.15 | bwd_microstep: 5094.83 | bwd_inner_microstep: 5075.55 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 14:13:07,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.09 | bwd_microstep: 4809.25 | bwd_inner_microstep: 4771.38 | bwd_allreduce_microstep: 37.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 14:13:16,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.78 | bwd_microstep: 4999.21 | bwd_inner_microstep: 4941.41 | bwd_allreduce_microstep: 57.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 14:13:25,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 14:13:25,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.49 | bwd_microstep: 5184.87 | bwd_inner_microstep: 5106.20 | bwd_allreduce_microstep: 78.61 | step_microstep: 181.60 [2024-07-31 14:13:25,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28977.80 | bwd: 40562.97 | bwd_inner: 40292.26 | bwd_allreduce: 270.21 | step: 182.17 50%|████▉ | 611/1230 [12:01:31<12:12:26, 71.00s/it] {'loss': 1.202, 'learning_rate': 1.059215828533566e-05, 'epoch': 0.5} 50%|████▉ | 611/1230 [12:01:31<12:12:26, 71.00s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 14:13:34,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.49 | bwd_microstep: 5164.82 | bwd_inner_microstep: 5093.85 | bwd_allreduce_microstep: 70.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3582 [2024-07-31 14:13:43,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.17 | bwd_microstep: 5215.03 | bwd_inner_microstep: 5126.57 | bwd_allreduce_microstep: 88.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-31 14:13:51,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.64 | bwd_microstep: 5172.32 | bwd_inner_microstep: 4770.09 | bwd_allreduce_microstep: 402.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2249 [2024-07-31 14:13:59,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3059.82 | bwd_microstep: 5056.30 | bwd_inner_microstep: 4667.39 | bwd_allreduce_microstep: 388.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3670 [2024-07-31 14:14:08,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.56 | bwd_microstep: 5148.13 | bwd_inner_microstep: 5051.94 | bwd_allreduce_microstep: 96.12 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 14:14:17,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.74 | bwd_microstep: 5130.36 | bwd_inner_microstep: 4732.06 | bwd_allreduce_microstep: 398.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2143 [2024-07-31 14:14:26,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.52 | bwd_microstep: 5105.45 | bwd_inner_microstep: 4710.33 | bwd_allreduce_microstep: 395.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 14:14:34,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 14:14:34,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.72 | bwd_microstep: 5091.66 | bwd_inner_microstep: 4695.93 | bwd_allreduce_microstep: 395.66 | step_microstep: 181.47 [2024-07-31 14:14:34,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28002.58 | bwd: 41084.07 | bwd_inner: 38848.09 | bwd_allreduce: 2235.47 | step: 182.06 50%|████▉ | 612/1230 [12:02:40<12:06:23, 70.52s/it] {'loss': 1.1954, 'learning_rate': 1.0565868921410777e-05, 'epoch': 0.5} 50%|████▉ | 612/1230 [12:02:40<12:06:23, 70.52s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4048 [2024-07-31 14:14:44,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3862.73 | bwd_microstep: 5324.25 | bwd_inner_microstep: 5305.07 | bwd_allreduce_microstep: 19.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3578 [2024-07-31 14:14:52,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.60 | bwd_microstep: 5092.30 | bwd_inner_microstep: 5002.16 | bwd_allreduce_microstep: 90.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3784 [2024-07-31 14:15:01,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3443.33 | bwd_microstep: 5019.68 | bwd_inner_microstep: 4985.66 | bwd_allreduce_microstep: 33.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3622 [2024-07-31 14:15:10,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.21 | bwd_microstep: 5114.80 | bwd_inner_microstep: 5026.66 | bwd_allreduce_microstep: 88.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 14:15:18,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.89 | bwd_microstep: 5137.47 | bwd_inner_microstep: 5060.67 | bwd_allreduce_microstep: 76.73 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 637 [2024-07-31 14:15:27,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.74 | bwd_microstep: 5169.37 | bwd_inner_microstep: 4771.50 | bwd_allreduce_microstep: 397.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 14:15:36,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.57 | bwd_microstep: 5184.87 | bwd_inner_microstep: 4784.73 | bwd_allreduce_microstep: 400.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 14:15:45,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-31 14:15:45,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.09 | bwd_microstep: 5051.63 | bwd_inner_microstep: 4987.46 | bwd_allreduce_microstep: 64.10 | step_microstep: 181.31 [2024-07-31 14:15:45,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28794.05 | bwd: 41094.35 | bwd_inner: 39923.84 | bwd_allreduce: 1170.03 | step: 181.89 50%|████▉ | 613/1230 [12:03:50<12:04:16, 70.43s/it] {'loss': 1.1587, 'learning_rate': 1.0539575633436645e-05, 'epoch': 0.5} 50%|████▉ | 613/1230 [12:03:50<12:04:16, 70.43s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2567 [2024-07-31 14:15:53,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.17 | bwd_microstep: 5302.02 | bwd_inner_microstep: 4893.84 | bwd_allreduce_microstep: 408.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3815 [2024-07-31 14:16:02,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.04 | bwd_microstep: 5145.10 | bwd_inner_microstep: 5101.03 | bwd_allreduce_microstep: 44.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-31 14:16:11,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.95 | bwd_microstep: 5029.47 | bwd_inner_microstep: 5010.12 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 14:16:20,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.36 | bwd_microstep: 5174.24 | bwd_inner_microstep: 5095.92 | bwd_allreduce_microstep: 78.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-31 14:16:28,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.12 | bwd_microstep: 4987.35 | bwd_inner_microstep: 4956.26 | bwd_allreduce_microstep: 31.03 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1175 [2024-07-31 14:16:36,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2996.55 | bwd_microstep: 5011.85 | bwd_inner_microstep: 4628.80 | bwd_allreduce_microstep: 382.98 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2175 [2024-07-31 14:16:45,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3304.27 | bwd_microstep: 4997.02 | bwd_inner_microstep: 4608.69 | bwd_allreduce_microstep: 388.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-31 14:16:54,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 14:16:54,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.33 | bwd_microstep: 5110.83 | bwd_inner_microstep: 4713.81 | bwd_allreduce_microstep: 396.95 | step_microstep: 182.52 [2024-07-31 14:16:54,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27984.70 | bwd: 40757.85 | bwd_inner: 39008.41 | bwd_allreduce: 1748.95 | step: 183.10 50%|████▉ | 614/1230 [12:05:00<11:58:54, 70.02s/it] {'loss': 1.164, 'learning_rate': 1.051327860374552e-05, 'epoch': 0.5} 50%|████▉ | 614/1230 [12:05:00<11:58:54, 70.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3803 [2024-07-31 14:17:03,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.25 | bwd_microstep: 5589.32 | bwd_inner_microstep: 5487.74 | bwd_allreduce_microstep: 101.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3805 [2024-07-31 14:17:12,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3791.00 | bwd_microstep: 5028.26 | bwd_inner_microstep: 5008.34 | bwd_allreduce_microstep: 19.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3645 [2024-07-31 14:17:21,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.43 | bwd_microstep: 5163.44 | bwd_inner_microstep: 5071.17 | bwd_allreduce_microstep: 92.21 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3677 [2024-07-31 14:17:30,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.42 | bwd_microstep: 5284.96 | bwd_inner_microstep: 5209.04 | bwd_allreduce_microstep: 75.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-31 14:17:38,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.14 | bwd_microstep: 5150.65 | bwd_inner_microstep: 5097.06 | bwd_allreduce_microstep: 53.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 14:17:47,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.43 | bwd_microstep: 5187.74 | bwd_inner_microstep: 5107.43 | bwd_allreduce_microstep: 80.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 14:17:56,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.68 | bwd_microstep: 5055.65 | bwd_inner_microstep: 5011.15 | bwd_allreduce_microstep: 44.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-31 14:18:05,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 14:18:05,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3466.93 | bwd_microstep: 5058.53 | bwd_inner_microstep: 4664.86 | bwd_allreduce_microstep: 393.60 | step_microstep: 181.76 [2024-07-31 14:18:05,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29219.19 | bwd: 41518.53 | bwd_inner: 40656.72 | bwd_allreduce: 861.33 | step: 182.34 50%|█████ | 615/1230 [12:06:11<12:00:58, 70.34s/it] {'loss': 1.2268, 'learning_rate': 1.0486978014695606e-05, 'epoch': 0.5} 50%|█████ | 615/1230 [12:06:11<12:00:58, 70.34s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3904 [2024-07-31 14:18:14,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.37 | bwd_microstep: 5508.74 | bwd_inner_microstep: 5401.65 | bwd_allreduce_microstep: 107.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2252 [2024-07-31 14:18:23,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.71 | bwd_microstep: 5214.57 | bwd_inner_microstep: 4809.71 | bwd_allreduce_microstep: 404.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-31 14:18:31,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.16 | bwd_microstep: 5129.14 | bwd_inner_microstep: 5083.19 | bwd_allreduce_microstep: 45.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3787 [2024-07-31 14:18:40,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.15 | bwd_microstep: 5026.96 | bwd_inner_microstep: 5007.67 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 14:18:49,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.58 | bwd_microstep: 4991.23 | bwd_inner_microstep: 4941.93 | bwd_allreduce_microstep: 49.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3802 [2024-07-31 14:18:58,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.65 | bwd_microstep: 5067.35 | bwd_inner_microstep: 5047.98 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 14:19:06,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.31 | bwd_microstep: 4944.46 | bwd_inner_microstep: 4920.03 | bwd_allreduce_microstep: 24.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-31 14:19:15,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 14:19:15,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.67 | bwd_microstep: 5059.49 | bwd_inner_microstep: 4992.11 | bwd_allreduce_microstep: 67.32 | step_microstep: 182.56 [2024-07-31 14:19:15,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29192.51 | bwd: 40941.91 | bwd_inner: 40204.21 | bwd_allreduce: 737.23 | step: 183.14 50%|█████ | 616/1230 [12:07:21<12:00:12, 70.38s/it] {'loss': 1.1936, 'learning_rate': 1.0460674048669783e-05, 'epoch': 0.5} 50%|█████ | 616/1230 [12:07:21<12:00:12, 70.38s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 14:19:24,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.20 | bwd_microstep: 5540.12 | bwd_inner_microstep: 5484.58 | bwd_allreduce_microstep: 55.48 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3574 [2024-07-31 14:19:33,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.24 | bwd_microstep: 5322.24 | bwd_inner_microstep: 5214.01 | bwd_allreduce_microstep: 108.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2236 [2024-07-31 14:19:42,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.77 | bwd_microstep: 5223.14 | bwd_inner_microstep: 4817.06 | bwd_allreduce_microstep: 406.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 14:19:51,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.81 | bwd_microstep: 5176.72 | bwd_inner_microstep: 4776.13 | bwd_allreduce_microstep: 400.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3763 [2024-07-31 14:19:59,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3229.45 | bwd_microstep: 4826.35 | bwd_inner_microstep: 4806.94 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3658 [2024-07-31 14:20:08,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.02 | bwd_microstep: 5157.70 | bwd_inner_microstep: 5073.07 | bwd_allreduce_microstep: 84.56 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2890 [2024-07-31 14:20:16,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.50 | bwd_microstep: 5084.06 | bwd_inner_microstep: 4686.75 | bwd_allreduce_microstep: 397.24 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2136 [2024-07-31 14:20:25,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 14:20:25,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.56 | bwd_microstep: 5111.10 | bwd_inner_microstep: 4713.74 | bwd_allreduce_microstep: 397.30 | step_microstep: 182.29 [2024-07-31 14:20:25,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28352.44 | bwd: 41441.42 | bwd_inner: 39572.21 | bwd_allreduce: 1868.72 | step: 182.88 50%|█████ | 617/1230 [12:08:31<11:58:15, 70.30s/it] {'loss': 1.0972, 'learning_rate': 1.0434366888074363e-05, 'epoch': 0.5} 50%|█████ | 617/1230 [12:08:31<11:58:15, 70.30s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3924 [2024-07-31 14:20:35,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3868.44 | bwd_microstep: 5456.33 | bwd_inner_microstep: 5395.25 | bwd_allreduce_microstep: 61.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3829 [2024-07-31 14:20:44,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.67 | bwd_microstep: 5231.33 | bwd_inner_microstep: 5189.65 | bwd_allreduce_microstep: 41.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-31 14:20:52,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.59 | bwd_microstep: 5175.44 | bwd_inner_microstep: 4775.20 | bwd_allreduce_microstep: 400.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 14:21:01,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.23 | bwd_microstep: 5159.50 | bwd_inner_microstep: 5082.37 | bwd_allreduce_microstep: 77.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 14:21:10,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.54 | bwd_microstep: 5133.00 | bwd_inner_microstep: 5082.81 | bwd_allreduce_microstep: 50.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 14:21:19,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.62 | bwd_microstep: 4975.38 | bwd_inner_microstep: 4955.99 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 14:21:27,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.14 | bwd_microstep: 5171.39 | bwd_inner_microstep: 5098.28 | bwd_allreduce_microstep: 73.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2138 [2024-07-31 14:21:36,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 14:21:36,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.28 | bwd_microstep: 5077.96 | bwd_inner_microstep: 4683.86 | bwd_allreduce_microstep: 394.03 | step_microstep: 181.88 [2024-07-31 14:21:36,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29227.44 | bwd: 41380.30 | bwd_inner: 40263.35 | bwd_allreduce: 1116.46 | step: 182.45 50%|█████ | 618/1230 [12:09:42<11:59:02, 70.49s/it] {'loss': 1.1468, 'learning_rate': 1.0408056715337795e-05, 'epoch': 0.5} 50%|█████ | 618/1230 [12:09:42<11:59:02, 70.49s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3868 [2024-07-31 14:21:45,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.86 | bwd_microstep: 5094.67 | bwd_inner_microstep: 5075.47 | bwd_allreduce_microstep: 19.13 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4044 [2024-07-31 14:21:54,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3858.57 | bwd_microstep: 5320.66 | bwd_inner_microstep: 5301.33 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2258 [2024-07-31 14:22:03,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.27 | bwd_microstep: 5305.32 | bwd_inner_microstep: 4893.40 | bwd_allreduce_microstep: 411.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3779 [2024-07-31 14:22:12,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.97 | bwd_microstep: 5148.97 | bwd_inner_microstep: 5097.53 | bwd_allreduce_microstep: 51.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 14:22:20,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3154.50 | bwd_microstep: 4732.59 | bwd_inner_microstep: 4703.27 | bwd_allreduce_microstep: 29.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3778 [2024-07-31 14:22:29,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.58 | bwd_microstep: 5103.64 | bwd_inner_microstep: 5062.18 | bwd_allreduce_microstep: 41.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3741 [2024-07-31 14:22:37,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.65 | bwd_microstep: 5076.76 | bwd_inner_microstep: 5008.97 | bwd_allreduce_microstep: 67.72 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2174 [2024-07-31 14:22:46,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 14:22:46,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.31 | bwd_microstep: 5078.24 | bwd_inner_microstep: 4682.86 | bwd_allreduce_microstep: 395.31 | step_microstep: 182.03 [2024-07-31 14:22:46,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28589.61 | bwd: 40860.83 | bwd_inner: 39824.96 | bwd_allreduce: 1035.39 | step: 182.62 50%|█████ | 619/1230 [12:10:52<11:55:40, 70.28s/it] {'loss': 1.2106, 'learning_rate': 1.0381743712909427e-05, 'epoch': 0.5} 50%|█████ | 619/1230 [12:10:52<11:55:40, 70.28s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 14:22:56,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.59 | bwd_microstep: 5801.24 | bwd_inner_microstep: 5756.40 | bwd_allreduce_microstep: 44.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3567 [2024-07-31 14:23:04,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.91 | bwd_microstep: 5088.98 | bwd_inner_microstep: 5015.95 | bwd_allreduce_microstep: 72.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 14:23:13,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.34 | bwd_microstep: 5269.85 | bwd_inner_microstep: 4860.48 | bwd_allreduce_microstep: 409.30 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3749 [2024-07-31 14:23:22,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.28 | bwd_microstep: 5235.48 | bwd_inner_microstep: 5158.88 | bwd_allreduce_microstep: 76.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 14:23:31,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.91 | bwd_microstep: 5198.24 | bwd_inner_microstep: 4792.63 | bwd_allreduce_microstep: 405.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-07-31 14:23:39,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.57 | bwd_microstep: 5174.67 | bwd_inner_microstep: 4772.59 | bwd_allreduce_microstep: 402.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3716 [2024-07-31 14:23:48,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3378.51 | bwd_microstep: 4963.91 | bwd_inner_microstep: 4912.77 | bwd_allreduce_microstep: 51.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 14:23:57,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 14:23:57,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.04 | bwd_microstep: 5026.32 | bwd_inner_microstep: 4969.96 | bwd_allreduce_microstep: 56.29 | step_microstep: 182.13 [2024-07-31 14:23:57,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28479.06 | bwd: 41758.66 | bwd_inner: 40239.60 | bwd_allreduce: 1518.58 | step: 182.72 50%|█████ | 620/1230 [12:12:02<11:55:23, 70.37s/it] {'loss': 1.2397, 'learning_rate': 1.0355428063258224e-05, 'epoch': 0.5} 50%|█████ | 620/1230 [12:12:02<11:55:23, 70.37s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2472 [2024-07-31 14:24:05,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.94 | bwd_microstep: 5262.15 | bwd_inner_microstep: 4856.20 | bwd_allreduce_microstep: 405.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2235 [2024-07-31 14:24:14,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.52 | bwd_microstep: 5181.94 | bwd_inner_microstep: 4779.55 | bwd_allreduce_microstep: 402.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3583 [2024-07-31 14:24:23,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.46 | bwd_microstep: 5213.42 | bwd_inner_microstep: 5118.23 | bwd_allreduce_microstep: 95.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 14:24:32,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.99 | bwd_microstep: 4968.45 | bwd_inner_microstep: 4935.83 | bwd_allreduce_microstep: 32.55 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 618 [2024-07-31 14:24:40,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.23 | bwd_microstep: 5213.50 | bwd_inner_microstep: 4813.39 | bwd_allreduce_microstep: 400.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3677 [2024-07-31 14:24:49,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3707.07 | bwd_microstep: 4905.34 | bwd_inner_microstep: 4878.56 | bwd_allreduce_microstep: 26.70 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2131 [2024-07-31 14:24:58,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.75 | bwd_microstep: 5114.33 | bwd_inner_microstep: 4717.30 | bwd_allreduce_microstep: 396.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-31 14:25:07,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 14:25:07,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.57 | bwd_microstep: 4916.86 | bwd_inner_microstep: 4891.84 | bwd_allreduce_microstep: 24.95 | step_microstep: 182.05 [2024-07-31 14:25:07,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28893.44 | bwd: 40775.97 | bwd_inner: 38990.84 | bwd_allreduce: 1784.64 | step: 182.63 50%|█████ | 621/1230 [12:13:12<11:53:05, 70.26s/it] {'loss': 1.1623, 'learning_rate': 1.0329109948871512e-05, 'epoch': 0.5} 50%|█████ | 621/1230 [12:13:12<11:53:05, 70.26s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 14:25:16,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3854.26 | bwd_microstep: 5344.01 | bwd_inner_microstep: 5324.96 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3821 [2024-07-31 14:25:25,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.50 | bwd_microstep: 5150.05 | bwd_inner_microstep: 5114.84 | bwd_allreduce_microstep: 35.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3736 [2024-07-31 14:25:33,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.80 | bwd_microstep: 4793.95 | bwd_inner_microstep: 4772.84 | bwd_allreduce_microstep: 21.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 14:25:42,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.65 | bwd_microstep: 5102.43 | bwd_inner_microstep: 5051.59 | bwd_allreduce_microstep: 50.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-31 14:25:50,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.61 | bwd_microstep: 5106.38 | bwd_inner_microstep: 5059.85 | bwd_allreduce_microstep: 46.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-31 14:25:59,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.89 | bwd_microstep: 5003.98 | bwd_inner_microstep: 4984.64 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 14:26:08,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.36 | bwd_microstep: 5086.97 | bwd_inner_microstep: 5024.98 | bwd_allreduce_microstep: 61.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 14:26:17,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 14:26:17,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.54 | bwd_microstep: 5023.50 | bwd_inner_microstep: 4973.69 | bwd_allreduce_microstep: 49.75 | step_microstep: 181.71 [2024-07-31 14:26:17,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29111.50 | bwd: 40611.25 | bwd_inner: 40307.33 | bwd_allreduce: 303.44 | step: 182.29 51%|█████ | 622/1230 [12:14:23<11:51:19, 70.20s/it] {'loss': 1.1782, 'learning_rate': 1.0302789552253702e-05, 'epoch': 0.51} 51%|█████ | 622/1230 [12:14:23<11:51:19, 70.20s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2341 [2024-07-31 14:26:26,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.64 | bwd_microstep: 5285.58 | bwd_inner_microstep: 4877.32 | bwd_allreduce_microstep: 408.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3843 [2024-07-31 14:26:34,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.69 | bwd_microstep: 5092.54 | bwd_inner_microstep: 5073.20 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3888 [2024-07-31 14:26:43,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.11 | bwd_microstep: 5131.37 | bwd_inner_microstep: 5112.01 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-31 14:26:52,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.71 | bwd_microstep: 4890.10 | bwd_inner_microstep: 4870.68 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 14:27:01,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.25 | bwd_microstep: 5043.20 | bwd_inner_microstep: 5003.31 | bwd_allreduce_microstep: 39.82 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2140 [2024-07-31 14:27:09,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3009.84 | bwd_microstep: 4906.37 | bwd_inner_microstep: 4529.68 | bwd_allreduce_microstep: 376.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-31 14:27:17,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.01 | bwd_microstep: 5006.78 | bwd_inner_microstep: 4987.37 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 14:27:26,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 14:27:26,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.35 | bwd_microstep: 5325.55 | bwd_inner_microstep: 5133.73 | bwd_allreduce_microstep: 191.75 | step_microstep: 181.59 [2024-07-31 14:27:26,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28688.52 | bwd: 40681.48 | bwd_inner: 39587.25 | bwd_allreduce: 1093.74 | step: 182.16 51%|█████ | 623/1230 [12:15:32<11:48:39, 70.05s/it] {'loss': 1.1776, 'learning_rate': 1.0276467055925046e-05, 'epoch': 0.51} 51%|█████ | 623/1230 [12:15:32<11:48:39, 70.05s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 14:27:36,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3879.42 | bwd_microstep: 5672.78 | bwd_inner_microstep: 5647.82 | bwd_allreduce_microstep: 24.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 14:27:45,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.42 | bwd_microstep: 5295.10 | bwd_inner_microstep: 5221.16 | bwd_allreduce_microstep: 73.87 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3770 [2024-07-31 14:27:54,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.80 | bwd_microstep: 5225.79 | bwd_inner_microstep: 5149.34 | bwd_allreduce_microstep: 76.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2219 [2024-07-31 14:28:03,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.25 | bwd_microstep: 5230.07 | bwd_inner_microstep: 4824.46 | bwd_allreduce_microstep: 405.55 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2200 [2024-07-31 14:28:11,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.31 | bwd_microstep: 5180.92 | bwd_inner_microstep: 4776.36 | bwd_allreduce_microstep: 404.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-31 14:28:20,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.75 | bwd_microstep: 5253.11 | bwd_inner_microstep: 4846.52 | bwd_allreduce_microstep: 406.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 14:28:29,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.86 | bwd_microstep: 4964.95 | bwd_inner_microstep: 4934.09 | bwd_allreduce_microstep: 30.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 14:28:38,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.89 [2024-07-31 14:28:38,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.03 | bwd_microstep: 4913.94 | bwd_inner_microstep: 4887.84 | bwd_allreduce_microstep: 26.03 | step_microstep: 181.78 [2024-07-31 14:28:38,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29265.74 | bwd: 41736.64 | bwd_inner: 40287.53 | bwd_allreduce: 1448.63 | step: 182.35 51%|█████ | 624/1230 [12:16:44<11:51:23, 70.43s/it] {'loss': 1.142, 'learning_rate': 1.0250142642420335e-05, 'epoch': 0.51} 51%|█████ | 624/1230 [12:16:44<11:51:23, 70.43s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 14:28:47,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.11 | bwd_microstep: 5236.43 | bwd_inner_microstep: 5212.67 | bwd_allreduce_microstep: 23.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3940 [2024-07-31 14:28:56,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3868.18 | bwd_microstep: 5299.43 | bwd_inner_microstep: 5263.10 | bwd_allreduce_microstep: 36.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-31 14:29:05,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.67 | bwd_microstep: 5116.23 | bwd_inner_microstep: 5070.34 | bwd_allreduce_microstep: 45.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 14:29:13,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.77 | bwd_microstep: 5168.91 | bwd_inner_microstep: 5086.22 | bwd_allreduce_microstep: 82.63 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3669 [2024-07-31 14:29:22,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3347.63 | bwd_microstep: 5033.16 | bwd_inner_microstep: 4962.32 | bwd_allreduce_microstep: 70.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3666 [2024-07-31 14:29:30,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.15 | bwd_microstep: 4866.68 | bwd_inner_microstep: 4847.31 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 14:29:38,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.69 | bwd_microstep: 4802.27 | bwd_inner_microstep: 4768.45 | bwd_allreduce_microstep: 33.76 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3711 [2024-07-31 14:29:47,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 14:29:47,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.43 | bwd_microstep: 5145.87 | bwd_inner_microstep: 5066.88 | bwd_allreduce_microstep: 78.92 | step_microstep: 181.68 [2024-07-31 14:29:47,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28673.56 | bwd: 40668.95 | bwd_inner: 40277.22 | bwd_allreduce: 391.26 | step: 182.26 51%|█████ | 625/1230 [12:17:53<11:47:55, 70.21s/it] {'loss': 1.1545, 'learning_rate': 1.0223816494287673e-05, 'epoch': 0.51} 51%|█████ | 625/1230 [12:17:53<11:47:55, 70.21s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2376 [2024-07-31 14:29:57,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.49 | bwd_microstep: 5529.81 | bwd_inner_microstep: 5103.91 | bwd_allreduce_microstep: 425.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 14:30:05,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.76 | bwd_microstep: 5166.00 | bwd_inner_microstep: 5088.96 | bwd_allreduce_microstep: 76.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-31 14:30:14,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.84 | bwd_microstep: 5064.70 | bwd_inner_microstep: 5034.95 | bwd_allreduce_microstep: 29.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 14:30:23,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.96 | bwd_microstep: 5239.49 | bwd_inner_microstep: 5174.89 | bwd_allreduce_microstep: 64.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 14:30:32,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.97 | bwd_microstep: 5128.23 | bwd_inner_microstep: 5057.21 | bwd_allreduce_microstep: 70.95 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2133 [2024-07-31 14:30:40,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.05 | bwd_microstep: 5122.98 | bwd_inner_microstep: 4725.13 | bwd_allreduce_microstep: 397.78 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3666 [2024-07-31 14:30:49,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.97 | bwd_microstep: 5040.63 | bwd_inner_microstep: 4970.15 | bwd_allreduce_microstep: 70.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 14:30:58,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 14:30:58,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.56 | bwd_microstep: 4988.48 | bwd_inner_microstep: 4941.64 | bwd_allreduce_microstep: 46.77 | step_microstep: 181.40 [2024-07-31 14:30:58,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28871.52 | bwd: 41280.29 | bwd_inner: 40096.79 | bwd_allreduce: 1183.03 | step: 181.98 51%|█████ | 626/1230 [12:19:04<11:47:35, 70.29s/it] {'loss': 1.2001, 'learning_rate': 1.0197488794087188e-05, 'epoch': 0.51} 51%|█████ | 626/1230 [12:19:04<11:47:35, 70.29s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2769 [2024-07-31 14:31:07,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.08 | bwd_microstep: 5179.63 | bwd_inner_microstep: 4778.80 | bwd_allreduce_microstep: 400.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3859 [2024-07-31 14:31:15,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.36 | bwd_microstep: 5251.29 | bwd_inner_microstep: 5198.33 | bwd_allreduce_microstep: 52.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 14:31:24,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.23 | bwd_microstep: 5138.91 | bwd_inner_microstep: 5066.89 | bwd_allreduce_microstep: 71.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-31 14:31:33,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.45 | bwd_microstep: 4995.51 | bwd_inner_microstep: 4976.14 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2242 [2024-07-31 14:31:41,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3009.37 | bwd_microstep: 4961.17 | bwd_inner_microstep: 4577.11 | bwd_allreduce_microstep: 384.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-31 14:31:50,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.15 | bwd_microstep: 5231.01 | bwd_inner_microstep: 4824.01 | bwd_allreduce_microstep: 406.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 14:31:58,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.26 | bwd_microstep: 5114.53 | bwd_inner_microstep: 4720.83 | bwd_allreduce_microstep: 393.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 14:32:07,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 14:32:07,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.00 | bwd_microstep: 5019.49 | bwd_inner_microstep: 4994.75 | bwd_allreduce_microstep: 24.68 | step_microstep: 182.21 [2024-07-31 14:32:07,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28052.80 | bwd: 40891.52 | bwd_inner: 39136.78 | bwd_allreduce: 1754.25 | step: 182.79 51%|█████ | 627/1230 [12:20:13<11:43:21, 69.99s/it] {'loss': 1.2183, 'learning_rate': 1.0171159724389766e-05, 'epoch': 0.51} 51%|█████ | 627/1230 [12:20:13<11:43:21, 69.99s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2321 [2024-07-31 14:32:16,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.14 | bwd_microstep: 5640.85 | bwd_inner_microstep: 5206.96 | bwd_allreduce_microstep: 433.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3834 [2024-07-31 14:32:25,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.65 | bwd_microstep: 5079.32 | bwd_inner_microstep: 5054.04 | bwd_allreduce_microstep: 25.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2249 [2024-07-31 14:32:34,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.94 | bwd_microstep: 5152.17 | bwd_inner_microstep: 4750.35 | bwd_allreduce_microstep: 401.75 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3729 [2024-07-31 14:32:43,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.27 | bwd_microstep: 5120.62 | bwd_inner_microstep: 5045.36 | bwd_allreduce_microstep: 75.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 14:32:52,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.56 | bwd_microstep: 5184.14 | bwd_inner_microstep: 5102.43 | bwd_allreduce_microstep: 81.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3809 [2024-07-31 14:33:00,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.15 | bwd_microstep: 5042.61 | bwd_inner_microstep: 5023.20 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-31 14:33:09,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.54 | bwd_microstep: 5097.56 | bwd_inner_microstep: 5050.12 | bwd_allreduce_microstep: 47.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-31 14:33:18,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 14:33:18,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.05 | bwd_microstep: 5083.19 | bwd_inner_microstep: 5038.46 | bwd_allreduce_microstep: 44.65 | step_microstep: 182.64 [2024-07-31 14:33:18,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29184.20 | bwd: 41400.42 | bwd_inner: 40270.86 | bwd_allreduce: 1129.08 | step: 183.22 51%|█████ | 628/1230 [12:21:24<11:44:59, 70.27s/it] {'loss': 1.1695, 'learning_rate': 1.0144829467775794e-05, 'epoch': 0.51} 51%|█████ | 628/1230 [12:21:24<11:44:59, 70.27s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3905 [2024-07-31 14:33:30,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 6390.61 | bwd_microstep: 5158.17 | bwd_inner_microstep: 5075.70 | bwd_allreduce_microstep: 82.39 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3933 [2024-07-31 14:33:39,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.90 | bwd_microstep: 5240.57 | bwd_inner_microstep: 5206.24 | bwd_allreduce_microstep: 34.25 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2284 [2024-07-31 14:33:47,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.87 | bwd_microstep: 5319.21 | bwd_inner_microstep: 4905.47 | bwd_allreduce_microstep: 413.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 14:33:56,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.39 | bwd_microstep: 4876.55 | bwd_inner_microstep: 4836.18 | bwd_allreduce_microstep: 40.30 | step_microstep: 0.08 dynamic ViT batch size: 9, images per sample: 4.5, dynamic token length: 1580 [2024-07-31 14:34:04,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.22 | bwd_microstep: 5197.93 | bwd_inner_microstep: 4796.23 | bwd_allreduce_microstep: 401.63 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2122 [2024-07-31 14:34:13,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.77 | bwd_microstep: 5217.56 | bwd_inner_microstep: 4813.04 | bwd_allreduce_microstep: 404.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 14:34:22,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.68 | bwd_microstep: 5043.40 | bwd_inner_microstep: 5012.17 | bwd_allreduce_microstep: 31.17 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 14:34:31,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 14:34:31,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.57 | bwd_microstep: 4923.20 | bwd_inner_microstep: 4899.18 | bwd_allreduce_microstep: 23.95 | step_microstep: 181.99 [2024-07-31 14:34:31,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 31472.90 | bwd: 40976.57 | bwd_inner: 39544.14 | bwd_allreduce: 1431.93 | step: 182.58 51%|█████ | 629/1230 [12:22:37<11:51:27, 71.03s/it] {'loss': 1.1617, 'learning_rate': 1.0118498206833886e-05, 'epoch': 0.51} 51%|█████ | 629/1230 [12:22:37<11:51:27, 71.03s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4030 [2024-07-31 14:34:40,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.76 | bwd_microstep: 5375.49 | bwd_inner_microstep: 5334.69 | bwd_allreduce_microstep: 40.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3576 [2024-07-31 14:34:49,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.66 | bwd_microstep: 5239.70 | bwd_inner_microstep: 5144.32 | bwd_allreduce_microstep: 95.32 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3766 [2024-07-31 14:34:58,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.33 | bwd_microstep: 5069.76 | bwd_inner_microstep: 5044.05 | bwd_allreduce_microstep: 25.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-31 14:35:06,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.31 | bwd_microstep: 5208.39 | bwd_inner_microstep: 4802.39 | bwd_allreduce_microstep: 405.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 14:35:15,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.58 | bwd_microstep: 4998.99 | bwd_inner_microstep: 4959.79 | bwd_allreduce_microstep: 39.14 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3712 [2024-07-31 14:35:24,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.18 | bwd_microstep: 4896.30 | bwd_inner_microstep: 4870.06 | bwd_allreduce_microstep: 26.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2125 [2024-07-31 14:35:32,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.89 | bwd_microstep: 5165.69 | bwd_inner_microstep: 4764.04 | bwd_allreduce_microstep: 401.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 14:35:40,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 14:35:40,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3178.35 | bwd_microstep: 4678.31 | bwd_inner_microstep: 4658.53 | bwd_allreduce_microstep: 19.70 | step_microstep: 182.02 [2024-07-31 14:35:40,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28495.97 | bwd: 40632.61 | bwd_inner: 39577.81 | bwd_allreduce: 1054.30 | step: 182.72 51%|█████ | 630/1230 [12:23:46<11:45:34, 70.56s/it] {'loss': 1.1075, 'learning_rate': 1.0092166124159631e-05, 'epoch': 0.51} 51%|█████ | 630/1230 [12:23:46<11:45:34, 70.56s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3902 [2024-07-31 14:35:49,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.94 | bwd_microstep: 5301.76 | bwd_inner_microstep: 5245.93 | bwd_allreduce_microstep: 55.76 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3306 [2024-07-31 14:35:58,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.68 | bwd_microstep: 5229.33 | bwd_inner_microstep: 5037.15 | bwd_allreduce_microstep: 192.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3818 [2024-07-31 14:36:07,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.41 | bwd_microstep: 5116.15 | bwd_inner_microstep: 5074.75 | bwd_allreduce_microstep: 41.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-31 14:36:16,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.59 | bwd_microstep: 4991.35 | bwd_inner_microstep: 4970.58 | bwd_allreduce_microstep: 20.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 14:36:24,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.24 | bwd_microstep: 4993.48 | bwd_inner_microstep: 4974.17 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 14:36:33,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.42 | bwd_microstep: 5176.38 | bwd_inner_microstep: 5101.30 | bwd_allreduce_microstep: 75.02 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2142 [2024-07-31 14:36:42,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.52 | bwd_microstep: 5107.17 | bwd_inner_microstep: 4712.36 | bwd_allreduce_microstep: 394.74 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3632 [2024-07-31 14:36:51,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 14:36:51,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.48 | bwd_microstep: 5013.54 | bwd_inner_microstep: 4945.05 | bwd_allreduce_microstep: 68.43 | step_microstep: 181.36 [2024-07-31 14:36:51,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28991.17 | bwd: 40929.14 | bwd_inner: 40061.23 | bwd_allreduce: 867.42 | step: 181.94 51%|█████▏ | 631/1230 [12:24:56<11:43:28, 70.47s/it] {'loss': 1.1466, 'learning_rate': 1.0065833402354302e-05, 'epoch': 0.51} 51%|█████▏ | 631/1230 [12:24:56<11:43:28, 70.47s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4078 [2024-07-31 14:36:59,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.56 | bwd_microstep: 5197.93 | bwd_inner_microstep: 5178.89 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3980 [2024-07-31 14:37:08,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.98 | bwd_microstep: 5062.98 | bwd_inner_microstep: 5043.59 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-31 14:37:17,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.76 | bwd_microstep: 5097.88 | bwd_inner_microstep: 5023.42 | bwd_allreduce_microstep: 74.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-31 14:37:26,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.61 | bwd_microstep: 5133.41 | bwd_inner_microstep: 5079.41 | bwd_allreduce_microstep: 53.92 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-31 14:37:34,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3245.48 | bwd_microstep: 5063.50 | bwd_inner_microstep: 4673.73 | bwd_allreduce_microstep: 389.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-31 14:37:43,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.35 | bwd_microstep: 5168.49 | bwd_inner_microstep: 4763.85 | bwd_allreduce_microstep: 404.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 14:37:51,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.36 | bwd_microstep: 4881.30 | bwd_inner_microstep: 4861.92 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 14:38:00,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 14:38:00,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.22 | bwd_microstep: 4899.81 | bwd_inner_microstep: 4878.36 | bwd_allreduce_microstep: 21.38 | step_microstep: 182.46 [2024-07-31 14:38:00,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28879.25 | bwd: 40505.28 | bwd_inner: 39503.12 | bwd_allreduce: 1001.67 | step: 183.06 51%|█████▏ | 632/1230 [12:26:06<11:40:04, 70.24s/it] {'loss': 1.1446, 'learning_rate': 1.0039500224023612e-05, 'epoch': 0.51} 51%|█████▏ | 632/1230 [12:26:06<11:40:04, 70.24s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3828 [2024-07-31 14:38:09,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3792.28 | bwd_microstep: 5151.05 | bwd_inner_microstep: 5122.59 | bwd_allreduce_microstep: 28.40 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3833 [2024-07-31 14:38:17,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3130.73 | bwd_microstep: 4977.46 | bwd_inner_microstep: 4935.90 | bwd_allreduce_microstep: 41.50 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2245 [2024-07-31 14:38:26,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.56 | bwd_microstep: 5169.08 | bwd_inner_microstep: 4766.68 | bwd_allreduce_microstep: 402.34 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2724 [2024-07-31 14:38:34,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3066.17 | bwd_microstep: 5001.18 | bwd_inner_microstep: 4614.27 | bwd_allreduce_microstep: 386.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3770 [2024-07-31 14:38:43,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.06 | bwd_microstep: 5200.50 | bwd_inner_microstep: 5121.95 | bwd_allreduce_microstep: 78.48 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-31 14:38:52,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.21 | bwd_microstep: 5072.38 | bwd_inner_microstep: 4681.02 | bwd_allreduce_microstep: 391.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 14:39:00,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.78 | bwd_microstep: 5033.98 | bwd_inner_microstep: 4993.74 | bwd_allreduce_microstep: 40.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-31 14:39:09,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 14:39:09,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.32 | bwd_microstep: 5141.53 | bwd_inner_microstep: 4746.17 | bwd_allreduce_microstep: 395.28 | step_microstep: 182.05 [2024-07-31 14:39:09,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27713.01 | bwd: 40747.16 | bwd_inner: 38982.25 | bwd_allreduce: 1764.41 | step: 182.76 51%|█████▏ | 633/1230 [12:27:15<11:34:34, 69.81s/it] {'loss': 1.1887, 'learning_rate': 1.0013166771776441e-05, 'epoch': 0.51} 51%|█████▏ | 633/1230 [12:27:15<11:34:34, 69.81s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3996 [2024-07-31 14:39:18,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.30 | bwd_microstep: 5573.49 | bwd_inner_microstep: 5504.31 | bwd_allreduce_microstep: 69.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2293 [2024-07-31 14:39:26,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.90 | bwd_microstep: 5005.39 | bwd_inner_microstep: 4619.40 | bwd_allreduce_microstep: 385.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2250 [2024-07-31 14:39:35,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.24 | bwd_microstep: 5142.38 | bwd_inner_microstep: 4744.16 | bwd_allreduce_microstep: 398.15 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 14:39:44,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.18 | bwd_microstep: 5113.96 | bwd_inner_microstep: 5069.25 | bwd_allreduce_microstep: 44.64 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3728 [2024-07-31 14:39:53,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.44 | bwd_microstep: 5135.26 | bwd_inner_microstep: 5076.71 | bwd_allreduce_microstep: 58.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-07-31 14:40:01,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.73 | bwd_microstep: 5124.15 | bwd_inner_microstep: 5077.00 | bwd_allreduce_microstep: 47.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2219 [2024-07-31 14:40:10,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.33 | bwd_microstep: 5165.49 | bwd_inner_microstep: 4765.42 | bwd_allreduce_microstep: 400.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 14:40:19,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 14:40:19,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.80 | bwd_microstep: 5119.82 | bwd_inner_microstep: 5047.83 | bwd_allreduce_microstep: 71.92 | step_microstep: 182.12 [2024-07-31 14:40:19,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28129.84 | bwd: 41379.93 | bwd_inner: 39904.01 | bwd_allreduce: 1475.43 | step: 182.83 52%|█████▏ | 634/1230 [12:28:25<11:33:31, 69.82s/it] {'loss': 1.2252, 'learning_rate': 9.986833228223566e-06, 'epoch': 0.52} 52%|█████▏ | 634/1230 [12:28:25<11:33:31, 69.82s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3954 [2024-07-31 14:40:28,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.91 | bwd_microstep: 5310.52 | bwd_inner_microstep: 5236.28 | bwd_allreduce_microstep: 74.16 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3849 [2024-07-31 14:40:37,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.79 | bwd_microstep: 5329.75 | bwd_inner_microstep: 5264.56 | bwd_allreduce_microstep: 65.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-07-31 14:40:46,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.57 | bwd_microstep: 5176.30 | bwd_inner_microstep: 5093.49 | bwd_allreduce_microstep: 82.75 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2218 [2024-07-31 14:40:55,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.93 | bwd_microstep: 5228.15 | bwd_inner_microstep: 4822.26 | bwd_allreduce_microstep: 405.81 | step_microstep: 0.20 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 14:41:03,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.56 | bwd_microstep: 5118.47 | bwd_inner_microstep: 4721.33 | bwd_allreduce_microstep: 397.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 14:41:12,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.16 | bwd_microstep: 5000.54 | bwd_inner_microstep: 4949.23 | bwd_allreduce_microstep: 51.24 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 14:41:20,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3408.67 | bwd_microstep: 5036.87 | bwd_inner_microstep: 4979.16 | bwd_allreduce_microstep: 57.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-31 14:41:29,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 14:41:29,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.18 | bwd_microstep: 4926.10 | bwd_inner_microstep: 4898.95 | bwd_allreduce_microstep: 27.09 | step_microstep: 181.91 [2024-07-31 14:41:29,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28497.66 | bwd: 41126.68 | bwd_inner: 39965.21 | bwd_allreduce: 1160.97 | step: 182.64 52%|█████▏ | 635/1230 [12:29:35<11:32:46, 69.86s/it] {'loss': 1.161, 'learning_rate': 9.960499775976393e-06, 'epoch': 0.52} 52%|█████▏ | 635/1230 [12:29:35<11:32:46, 69.86s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 14:41:38,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.20 | bwd_microstep: 5220.59 | bwd_inner_microstep: 5194.16 | bwd_allreduce_microstep: 26.36 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4075 [2024-07-31 14:41:47,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.04 | bwd_microstep: 5240.51 | bwd_inner_microstep: 5221.24 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-31 14:41:56,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.07 | bwd_microstep: 5153.15 | bwd_inner_microstep: 5100.36 | bwd_allreduce_microstep: 52.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 14:42:04,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.92 | bwd_microstep: 5052.18 | bwd_inner_microstep: 5009.24 | bwd_allreduce_microstep: 42.87 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3663 [2024-07-31 14:42:13,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.72 | bwd_microstep: 5180.65 | bwd_inner_microstep: 5091.50 | bwd_allreduce_microstep: 89.07 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 14:42:21,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3201.77 | bwd_microstep: 4724.27 | bwd_inner_microstep: 4699.34 | bwd_allreduce_microstep: 24.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-31 14:42:30,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.59 | bwd_microstep: 5080.28 | bwd_inner_microstep: 5036.74 | bwd_allreduce_microstep: 43.48 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3668 [2024-07-31 14:42:39,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 14:42:39,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.72 | bwd_microstep: 5169.46 | bwd_inner_microstep: 5081.26 | bwd_allreduce_microstep: 88.13 | step_microstep: 182.29 [2024-07-31 14:42:39,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28721.92 | bwd: 40821.05 | bwd_inner: 40433.79 | bwd_allreduce: 386.77 | step: 182.90 52%|█████▏ | 636/1230 [12:30:45<11:31:39, 69.86s/it] {'loss': 1.1616, 'learning_rate': 9.934166597645703e-06, 'epoch': 0.52} 52%|█████▏ | 636/1230 [12:30:45<11:31:39, 69.86s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 4013 [2024-07-31 14:42:48,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.95 | bwd_microstep: 5226.68 | bwd_inner_microstep: 5184.86 | bwd_allreduce_microstep: 41.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3920 [2024-07-31 14:42:57,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.37 | bwd_microstep: 5176.88 | bwd_inner_microstep: 5157.60 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3986 [2024-07-31 14:43:06,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3846.17 | bwd_microstep: 5236.46 | bwd_inner_microstep: 5217.06 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2065 [2024-07-31 14:43:14,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.19 | bwd_microstep: 5161.01 | bwd_inner_microstep: 4759.78 | bwd_allreduce_microstep: 401.16 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2087 [2024-07-31 14:43:23,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.06 | bwd_microstep: 5208.60 | bwd_inner_microstep: 4801.53 | bwd_allreduce_microstep: 407.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 14:43:32,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.98 | bwd_microstep: 5278.51 | bwd_inner_microstep: 5149.04 | bwd_allreduce_microstep: 129.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3660 [2024-07-31 14:43:41,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.22 | bwd_microstep: 4956.45 | bwd_inner_microstep: 4925.56 | bwd_allreduce_microstep: 30.83 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 14:43:50,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 14:43:50,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.18 | bwd_microstep: 5044.77 | bwd_inner_microstep: 4986.01 | bwd_allreduce_microstep: 58.69 | step_microstep: 182.44 [2024-07-31 14:43:50,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29335.02 | bwd: 41289.35 | bwd_inner: 40181.38 | bwd_allreduce: 1107.50 | step: 183.04 52%|█████▏ | 637/1230 [12:31:56<11:33:44, 70.19s/it] {'loss': 1.1537, 'learning_rate': 9.907833875840374e-06, 'epoch': 0.52} 52%|█████▏ | 637/1230 [12:31:56<11:33:44, 70.19s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2377 [2024-07-31 14:43:59,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.76 | bwd_microstep: 5378.27 | bwd_inner_microstep: 4964.55 | bwd_allreduce_microstep: 413.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-07-31 14:44:08,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.99 | bwd_microstep: 5229.98 | bwd_inner_microstep: 5144.76 | bwd_allreduce_microstep: 85.15 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-31 14:44:16,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.23 | bwd_microstep: 4842.11 | bwd_inner_microstep: 4822.18 | bwd_allreduce_microstep: 19.87 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2215 [2024-07-31 14:44:24,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3045.30 | bwd_microstep: 4997.45 | bwd_inner_microstep: 4610.52 | bwd_allreduce_microstep: 386.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-31 14:44:32,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.77 | bwd_microstep: 5072.33 | bwd_inner_microstep: 5030.13 | bwd_allreduce_microstep: 42.13 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3699 [2024-07-31 14:44:41,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.54 | bwd_microstep: 5156.85 | bwd_inner_microstep: 5061.74 | bwd_allreduce_microstep: 95.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 14:44:50,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3441.91 | bwd_microstep: 5013.61 | bwd_inner_microstep: 4624.72 | bwd_allreduce_microstep: 388.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3653 [2024-07-31 14:44:58,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 14:44:58,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.10 | bwd_microstep: 4993.67 | bwd_inner_microstep: 4928.24 | bwd_allreduce_microstep: 65.35 | step_microstep: 181.50 [2024-07-31 14:44:58,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27597.49 | bwd: 40684.25 | bwd_inner: 39186.78 | bwd_allreduce: 1496.97 | step: 182.21 52%|█████▏ | 638/1230 [12:33:04<11:27:53, 69.72s/it] {'loss': 1.1833, 'learning_rate': 9.881501793166117e-06, 'epoch': 0.52} 52%|█████▏ | 638/1230 [12:33:04<11:27:53, 69.72s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 14:45:08,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3858.73 | bwd_microstep: 5338.81 | bwd_inner_microstep: 5319.70 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-07-31 14:45:17,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.19 | bwd_microstep: 5356.45 | bwd_inner_microstep: 5244.46 | bwd_allreduce_microstep: 111.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 14:45:25,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.85 | bwd_microstep: 5083.09 | bwd_inner_microstep: 5048.90 | bwd_allreduce_microstep: 34.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 14:45:34,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.62 | bwd_microstep: 5137.20 | bwd_inner_microstep: 5066.50 | bwd_allreduce_microstep: 70.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 14:45:43,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.70 | bwd_microstep: 5046.90 | bwd_inner_microstep: 4983.51 | bwd_allreduce_microstep: 63.32 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-31 14:45:51,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.23 | bwd_microstep: 4964.69 | bwd_inner_microstep: 4934.24 | bwd_allreduce_microstep: 30.38 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 14:46:00,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.59 | bwd_microstep: 5053.15 | bwd_inner_microstep: 4990.02 | bwd_allreduce_microstep: 63.06 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 14:46:09,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 14:46:09,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.28 | bwd_microstep: 5026.21 | bwd_inner_microstep: 4967.04 | bwd_allreduce_microstep: 59.10 | step_microstep: 181.96 [2024-07-31 14:46:09,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29292.10 | bwd: 41006.47 | bwd_inner: 40554.32 | bwd_allreduce: 451.66 | step: 182.57 52%|█████▏ | 639/1230 [12:34:15<11:29:25, 69.99s/it] {'loss': 1.2053, 'learning_rate': 9.85517053222421e-06, 'epoch': 0.52} 52%|█████▏ | 639/1230 [12:34:15<11:29:25, 69.99s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2360 [2024-07-31 14:46:18,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.07 | bwd_microstep: 5552.19 | bwd_inner_microstep: 5126.77 | bwd_allreduce_microstep: 425.35 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3813 [2024-07-31 14:46:26,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.35 | bwd_microstep: 4833.46 | bwd_inner_microstep: 4814.04 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-31 14:46:35,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.91 | bwd_microstep: 5094.03 | bwd_inner_microstep: 5023.89 | bwd_allreduce_microstep: 70.07 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-31 14:46:44,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.15 | bwd_microstep: 5223.67 | bwd_inner_microstep: 4815.24 | bwd_allreduce_microstep: 408.36 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2853 [2024-07-31 14:46:53,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.22 | bwd_microstep: 5216.56 | bwd_inner_microstep: 4809.82 | bwd_allreduce_microstep: 406.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-31 14:47:01,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.62 | bwd_microstep: 5004.30 | bwd_inner_microstep: 4965.10 | bwd_allreduce_microstep: 39.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 14:47:10,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.94 | bwd_microstep: 5059.12 | bwd_inner_microstep: 4996.73 | bwd_allreduce_microstep: 62.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3663 [2024-07-31 14:47:19,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 14:47:19,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.99 | bwd_microstep: 5050.46 | bwd_inner_microstep: 5005.28 | bwd_allreduce_microstep: 45.11 | step_microstep: 181.47 [2024-07-31 14:47:19,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28509.16 | bwd: 41033.77 | bwd_inner: 39556.80 | bwd_allreduce: 1476.47 | step: 182.17 52%|█████▏ | 640/1230 [12:35:25<11:27:54, 69.96s/it] {'loss': 1.1529, 'learning_rate': 9.828840275610238e-06, 'epoch': 0.52} 52%|█████▏ | 640/1230 [12:35:25<11:27:54, 69.96s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3889 [2024-07-31 14:47:28,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3832.77 | bwd_microstep: 5144.44 | bwd_inner_microstep: 5119.91 | bwd_allreduce_microstep: 24.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3833 [2024-07-31 14:47:37,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.60 | bwd_microstep: 5037.56 | bwd_inner_microstep: 5018.18 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-31 14:47:45,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.87 | bwd_microstep: 5045.42 | bwd_inner_microstep: 5018.01 | bwd_allreduce_microstep: 27.35 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-31 14:47:54,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3241.01 | bwd_microstep: 4854.95 | bwd_inner_microstep: 4806.64 | bwd_allreduce_microstep: 48.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-31 14:48:02,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.16 | bwd_microstep: 5181.95 | bwd_inner_microstep: 4778.65 | bwd_allreduce_microstep: 403.23 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3625 [2024-07-31 14:48:11,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.54 | bwd_microstep: 5175.90 | bwd_inner_microstep: 5083.22 | bwd_allreduce_microstep: 92.61 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 14:48:20,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.13 | bwd_microstep: 5122.53 | bwd_inner_microstep: 5054.67 | bwd_allreduce_microstep: 67.80 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-31 14:48:29,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 14:48:29,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.77 | bwd_microstep: 4913.42 | bwd_inner_microstep: 4891.69 | bwd_allreduce_microstep: 21.66 | step_microstep: 181.98 [2024-07-31 14:48:29,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29003.74 | bwd: 40476.17 | bwd_inner: 39770.91 | bwd_allreduce: 704.77 | step: 182.70 52%|█████▏ | 641/1230 [12:36:35<11:26:19, 69.91s/it] {'loss': 1.1838, 'learning_rate': 9.802511205912817e-06, 'epoch': 0.52} 52%|█████▏ | 641/1230 [12:36:35<11:26:19, 69.91s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 14:48:38,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3854.11 | bwd_microstep: 5364.30 | bwd_inner_microstep: 5345.23 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3877 [2024-07-31 14:48:47,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.32 | bwd_microstep: 5112.88 | bwd_inner_microstep: 5093.46 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2230 [2024-07-31 14:48:55,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.32 | bwd_microstep: 5167.45 | bwd_inner_microstep: 4766.11 | bwd_allreduce_microstep: 401.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2240 [2024-07-31 14:49:04,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.46 | bwd_microstep: 5129.21 | bwd_inner_microstep: 4733.97 | bwd_allreduce_microstep: 395.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 14:49:13,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.70 | bwd_microstep: 5134.74 | bwd_inner_microstep: 5058.48 | bwd_allreduce_microstep: 76.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 14:49:22,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.81 | bwd_microstep: 5187.27 | bwd_inner_microstep: 4783.76 | bwd_allreduce_microstep: 403.44 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2137 [2024-07-31 14:49:30,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.06 | bwd_microstep: 5118.32 | bwd_inner_microstep: 4722.03 | bwd_allreduce_microstep: 396.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 14:49:39,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 14:49:39,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.79 | bwd_microstep: 5181.94 | bwd_inner_microstep: 4780.02 | bwd_allreduce_microstep: 401.84 | step_microstep: 181.89 [2024-07-31 14:49:39,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28801.48 | bwd: 41396.09 | bwd_inner: 39283.01 | bwd_allreduce: 2112.59 | step: 182.49 52%|█████▏ | 642/1230 [12:37:45<11:26:57, 70.10s/it] {'loss': 1.1906, 'learning_rate': 9.77618350571233e-06, 'epoch': 0.52} 52%|█████▏ | 642/1230 [12:37:45<11:26:57, 70.10s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 14:49:48,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3859.92 | bwd_microstep: 5381.80 | bwd_inner_microstep: 5359.06 | bwd_allreduce_microstep: 22.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3548 [2024-07-31 14:49:57,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.63 | bwd_microstep: 5269.55 | bwd_inner_microstep: 5167.44 | bwd_allreduce_microstep: 102.05 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3826 [2024-07-31 14:50:06,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3783.72 | bwd_microstep: 5065.10 | bwd_inner_microstep: 5045.31 | bwd_allreduce_microstep: 19.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3763 [2024-07-31 14:50:15,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3362.26 | bwd_microstep: 4970.42 | bwd_inner_microstep: 4940.48 | bwd_allreduce_microstep: 29.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-31 14:50:23,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.65 | bwd_microstep: 4954.08 | bwd_inner_microstep: 4926.46 | bwd_allreduce_microstep: 27.55 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 1671 [2024-07-31 14:50:32,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3449.70 | bwd_microstep: 5061.03 | bwd_inner_microstep: 4668.40 | bwd_allreduce_microstep: 392.56 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 14:50:41,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.80 | bwd_microstep: 5076.67 | bwd_inner_microstep: 5034.46 | bwd_allreduce_microstep: 42.15 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 14:50:49,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 14:50:49,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.85 | bwd_microstep: 5131.49 | bwd_inner_microstep: 4734.97 | bwd_allreduce_microstep: 396.45 | step_microstep: 181.58 [2024-07-31 14:50:49,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29087.43 | bwd: 40910.12 | bwd_inner: 39876.53 | bwd_allreduce: 1033.12 | step: 182.30 52%|█████▏ | 643/1230 [12:38:55<11:26:29, 70.17s/it] {'loss': 1.2112, 'learning_rate': 9.74985735757967e-06, 'epoch': 0.52} 52%|█████▏ | 643/1230 [12:38:55<11:26:29, 70.17s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3924 [2024-07-31 14:50:58,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.84 | bwd_microstep: 5170.91 | bwd_inner_microstep: 5127.83 | bwd_allreduce_microstep: 43.01 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2224 [2024-07-31 14:51:07,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3098.10 | bwd_microstep: 5195.56 | bwd_inner_microstep: 4795.47 | bwd_allreduce_microstep: 400.03 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3768 [2024-07-31 14:51:16,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.61 | bwd_microstep: 5221.68 | bwd_inner_microstep: 5136.49 | bwd_allreduce_microstep: 85.12 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3583 [2024-07-31 14:51:24,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.43 | bwd_microstep: 5215.23 | bwd_inner_microstep: 5114.21 | bwd_allreduce_microstep: 100.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 14:51:32,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3026.73 | bwd_microstep: 4894.73 | bwd_inner_microstep: 4516.55 | bwd_allreduce_microstep: 378.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-31 14:51:41,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.71 | bwd_microstep: 5038.24 | bwd_inner_microstep: 4971.93 | bwd_allreduce_microstep: 66.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 14:51:50,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.43 | bwd_microstep: 5187.19 | bwd_inner_microstep: 5107.89 | bwd_allreduce_microstep: 79.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 14:51:59,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-31 14:51:59,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.71 | bwd_microstep: 5014.59 | bwd_inner_microstep: 4958.68 | bwd_allreduce_microstep: 55.84 | step_microstep: 181.51 [2024-07-31 14:51:59,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27768.47 | bwd: 40938.11 | bwd_inner: 39728.99 | bwd_allreduce: 1208.63 | step: 182.11 52%|█████▏ | 644/1230 [12:40:04<11:22:00, 69.83s/it] {'loss': 1.2169, 'learning_rate': 9.723532944074961e-06, 'epoch': 0.52} 52%|█████▏ | 644/1230 [12:40:04<11:22:00, 69.83s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3606 [2024-07-31 14:52:07,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.52 | bwd_microstep: 5180.86 | bwd_inner_microstep: 5106.50 | bwd_allreduce_microstep: 74.29 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3742 [2024-07-31 14:52:15,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3125.71 | bwd_microstep: 4926.75 | bwd_inner_microstep: 4889.21 | bwd_allreduce_microstep: 37.47 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3769 [2024-07-31 14:52:24,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.45 | bwd_microstep: 5045.93 | bwd_inner_microstep: 4991.40 | bwd_allreduce_microstep: 54.46 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3764 [2024-07-31 14:52:33,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.03 | bwd_microstep: 5154.46 | bwd_inner_microstep: 5081.69 | bwd_allreduce_microstep: 72.70 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3000 [2024-07-31 14:52:42,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.86 | bwd_microstep: 5127.71 | bwd_inner_microstep: 4826.48 | bwd_allreduce_microstep: 301.16 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3636 [2024-07-31 14:52:50,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.23 | bwd_microstep: 5059.64 | bwd_inner_microstep: 4973.32 | bwd_allreduce_microstep: 86.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-31 14:52:59,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.15 | bwd_microstep: 5003.45 | bwd_inner_microstep: 4949.93 | bwd_allreduce_microstep: 53.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-31 14:53:08,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.85 [2024-07-31 14:53:08,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.93 | bwd_microstep: 4927.08 | bwd_inner_microstep: 4903.33 | bwd_allreduce_microstep: 23.68 | step_microstep: 182.57 [2024-07-31 14:53:08,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28350.77 | bwd: 40425.87 | bwd_inner: 39721.81 | bwd_allreduce: 703.59 | step: 183.15 52%|█████▏ | 645/1230 [12:41:14<11:18:44, 69.61s/it] {'loss': 1.1831, 'learning_rate': 9.6972104477463e-06, 'epoch': 0.52} 52%|█████▏ | 645/1230 [12:41:14<11:18:44, 69.61s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2387 [2024-07-31 14:53:17,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.39 | bwd_microstep: 5411.75 | bwd_inner_microstep: 4999.57 | bwd_allreduce_microstep: 412.11 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2229 [2024-07-31 14:53:25,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.12 | bwd_microstep: 5175.96 | bwd_inner_microstep: 4773.78 | bwd_allreduce_microstep: 402.11 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 14:53:34,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.32 | bwd_microstep: 5023.69 | bwd_inner_microstep: 4987.65 | bwd_allreduce_microstep: 35.98 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2251 [2024-07-31 14:53:42,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.22 | bwd_microstep: 5061.15 | bwd_inner_microstep: 4669.03 | bwd_allreduce_microstep: 392.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2130 [2024-07-31 14:53:51,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.30 | bwd_microstep: 5104.49 | bwd_inner_microstep: 4708.74 | bwd_allreduce_microstep: 395.68 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 14:54:00,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.64 | bwd_microstep: 5067.80 | bwd_inner_microstep: 5006.49 | bwd_allreduce_microstep: 61.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 14:54:08,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.17 | bwd_microstep: 4999.44 | bwd_inner_microstep: 4946.87 | bwd_allreduce_microstep: 52.50 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3690 [2024-07-31 14:54:17,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 14:54:17,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.33 | bwd_microstep: 4803.10 | bwd_inner_microstep: 4783.73 | bwd_allreduce_microstep: 19.31 | step_microstep: 181.26 [2024-07-31 14:54:17,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28254.38 | bwd: 40647.36 | bwd_inner: 38875.78 | bwd_allreduce: 1771.08 | step: 181.95 53%|█████▎ | 646/1230 [12:42:23<11:16:27, 69.50s/it] {'loss': 1.1826, 'learning_rate': 9.670890051128493e-06, 'epoch': 0.53} 53%|█████▎ | 646/1230 [12:42:23<11:16:27, 69.50s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3943 [2024-07-31 14:54:25,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3190.17 | bwd_microstep: 4891.28 | bwd_inner_microstep: 4869.38 | bwd_allreduce_microstep: 21.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3887 [2024-07-31 14:54:33,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.42 | bwd_microstep: 4980.31 | bwd_inner_microstep: 4956.33 | bwd_allreduce_microstep: 23.91 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2292 [2024-07-31 14:54:42,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.54 | bwd_microstep: 5262.36 | bwd_inner_microstep: 4854.18 | bwd_allreduce_microstep: 408.12 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2084 [2024-07-31 14:54:51,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.10 | bwd_microstep: 5224.06 | bwd_inner_microstep: 4819.36 | bwd_allreduce_microstep: 404.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2226 [2024-07-31 14:55:00,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3437.99 | bwd_microstep: 5008.75 | bwd_inner_microstep: 4620.44 | bwd_allreduce_microstep: 388.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 14:55:08,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.93 | bwd_microstep: 5032.95 | bwd_inner_microstep: 4981.12 | bwd_allreduce_microstep: 51.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 14:55:17,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.70 | bwd_microstep: 5010.88 | bwd_inner_microstep: 4960.03 | bwd_allreduce_microstep: 50.78 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 14:55:26,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 14:55:26,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.37 | bwd_microstep: 4998.43 | bwd_inner_microstep: 4945.35 | bwd_allreduce_microstep: 53.01 | step_microstep: 182.71 [2024-07-31 14:55:26,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27918.13 | bwd: 40409.01 | bwd_inner: 39006.12 | bwd_allreduce: 1402.39 | step: 183.30 53%|█████▎ | 647/1230 [12:43:31<11:12:51, 69.25s/it] {'loss': 1.183, 'learning_rate': 9.64457193674178e-06, 'epoch': 0.53} 53%|█████▎ | 647/1230 [12:43:31<11:12:51, 69.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3964 [2024-07-31 14:55:34,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.78 | bwd_microstep: 5308.73 | bwd_inner_microstep: 5254.24 | bwd_allreduce_microstep: 54.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-31 14:55:43,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.28 | bwd_microstep: 5047.99 | bwd_inner_microstep: 5022.71 | bwd_allreduce_microstep: 25.22 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2066 [2024-07-31 14:55:51,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3027.22 | bwd_microstep: 4980.10 | bwd_inner_microstep: 4594.27 | bwd_allreduce_microstep: 385.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3766 [2024-07-31 14:56:00,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.82 | bwd_microstep: 5006.87 | bwd_inner_microstep: 4987.58 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 14:56:09,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.26 | bwd_microstep: 5176.92 | bwd_inner_microstep: 4775.37 | bwd_allreduce_microstep: 401.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 14:56:18,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.37 | bwd_microstep: 5154.64 | bwd_inner_microstep: 5100.94 | bwd_allreduce_microstep: 53.63 | step_microstep: 0.21 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 14:56:26,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.13 | bwd_microstep: 5107.19 | bwd_inner_microstep: 4708.27 | bwd_allreduce_microstep: 398.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 14:56:35,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 14:56:35,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.44 | bwd_microstep: 5030.24 | bwd_inner_microstep: 4979.48 | bwd_allreduce_microstep: 50.69 | step_microstep: 181.86 [2024-07-31 14:56:35,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28318.20 | bwd: 40812.68 | bwd_inner: 39422.81 | bwd_allreduce: 1389.37 | step: 182.59 53%|█████▎ | 648/1230 [12:44:41<11:12:20, 69.31s/it] {'loss': 1.1277, 'learning_rate': 9.618256287090576e-06, 'epoch': 0.53} 53%|█████▎ | 648/1230 [12:44:41<11:12:20, 69.31s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3882 [2024-07-31 14:56:44,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.46 | bwd_microstep: 5469.71 | bwd_inner_microstep: 5382.48 | bwd_allreduce_microstep: 87.15 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-31 14:56:53,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.73 | bwd_microstep: 5269.76 | bwd_inner_microstep: 5176.80 | bwd_allreduce_microstep: 92.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3782 [2024-07-31 14:57:02,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.63 | bwd_microstep: 5027.74 | bwd_inner_microstep: 5008.31 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 14:57:11,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.65 | bwd_microstep: 5167.20 | bwd_inner_microstep: 5117.32 | bwd_allreduce_microstep: 49.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3096 [2024-07-31 14:57:20,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.81 | bwd_microstep: 5156.67 | bwd_inner_microstep: 4849.04 | bwd_allreduce_microstep: 307.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 14:57:28,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.68 | bwd_microstep: 5245.65 | bwd_inner_microstep: 4837.79 | bwd_allreduce_microstep: 407.79 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2760 [2024-07-31 14:57:37,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.76 | bwd_microstep: 5090.38 | bwd_inner_microstep: 4694.72 | bwd_allreduce_microstep: 395.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 14:57:45,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 14:57:45,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3010.87 | bwd_microstep: 4881.81 | bwd_inner_microstep: 4505.75 | bwd_allreduce_microstep: 376.00 | step_microstep: 182.10 [2024-07-31 14:57:45,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28461.47 | bwd: 41308.90 | bwd_inner: 39572.14 | bwd_allreduce: 1736.27 | step: 182.69 53%|█████▎ | 649/1230 [12:45:51<11:13:28, 69.55s/it] {'loss': 1.2159, 'learning_rate': 9.591943284662208e-06, 'epoch': 0.53} 53%|█████▎ | 649/1230 [12:45:51<11:13:28, 69.55s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2065 [2024-07-31 14:57:54,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.37 | bwd_microstep: 5516.50 | bwd_inner_microstep: 5093.61 | bwd_allreduce_microstep: 422.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3868 [2024-07-31 14:58:03,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3785.19 | bwd_microstep: 5093.03 | bwd_inner_microstep: 5073.56 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3589 [2024-07-31 14:58:11,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3107.16 | bwd_microstep: 4990.59 | bwd_inner_microstep: 4931.37 | bwd_allreduce_microstep: 59.15 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2071 [2024-07-31 14:58:19,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3032.24 | bwd_microstep: 5012.34 | bwd_inner_microstep: 4624.45 | bwd_allreduce_microstep: 387.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 14:58:28,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.57 | bwd_microstep: 4984.86 | bwd_inner_microstep: 4965.47 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-07-31 14:58:37,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.95 | bwd_microstep: 5040.10 | bwd_inner_microstep: 5020.80 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 14:58:46,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.18 | bwd_microstep: 5069.00 | bwd_inner_microstep: 5001.73 | bwd_allreduce_microstep: 67.21 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2090 [2024-07-31 14:58:54,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 14:58:54,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.25 | bwd_microstep: 5121.28 | bwd_inner_microstep: 4725.18 | bwd_allreduce_microstep: 396.02 | step_microstep: 181.27 [2024-07-31 14:58:54,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28106.83 | bwd: 40827.68 | bwd_inner: 39436.12 | bwd_allreduce: 1391.07 | step: 181.85 53%|█████▎ | 650/1230 [12:47:00<11:11:29, 69.46s/it] {'loss': 1.1904, 'learning_rate': 9.56563311192564e-06, 'epoch': 0.53} 53%|█████▎ | 650/1230 [12:47:00<11:11:29, 69.46s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2217 [2024-07-31 14:59:03,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3123.38 | bwd_microstep: 5285.15 | bwd_inner_microstep: 4882.67 | bwd_allreduce_microstep: 402.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3795 [2024-07-31 14:59:12,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.55 | bwd_microstep: 5319.44 | bwd_inner_microstep: 5251.08 | bwd_allreduce_microstep: 68.29 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3914 [2024-07-31 14:59:20,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.86 | bwd_microstep: 4978.27 | bwd_inner_microstep: 4958.97 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3907 [2024-07-31 14:59:29,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.23 | bwd_microstep: 5157.82 | bwd_inner_microstep: 5138.39 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-31 14:59:38,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.03 | bwd_microstep: 5214.58 | bwd_inner_microstep: 4812.12 | bwd_allreduce_microstep: 402.39 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 14:59:47,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.82 | bwd_microstep: 4984.27 | bwd_inner_microstep: 4964.96 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-31 14:59:56,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.09 | bwd_microstep: 5007.77 | bwd_inner_microstep: 4988.44 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2143 [2024-07-31 15:00:04,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 15:00:04,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.87 | bwd_microstep: 5125.47 | bwd_inner_microstep: 4727.76 | bwd_allreduce_microstep: 397.64 | step_microstep: 182.64 [2024-07-31 15:00:04,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28721.74 | bwd: 41072.75 | bwd_inner: 39724.34 | bwd_allreduce: 1347.91 | step: 183.33 53%|█████▎ | 651/1230 [12:48:10<11:12:15, 69.66s/it] {'loss': 1.1363, 'learning_rate': 9.539325951330217e-06, 'epoch': 0.53} 53%|█████▎ | 651/1230 [12:48:10<11:12:15, 69.66s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3925 [2024-07-31 15:00:14,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.63 | bwd_microstep: 5541.63 | bwd_inner_microstep: 5452.43 | bwd_allreduce_microstep: 89.13 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-31 15:00:23,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.57 | bwd_microstep: 5114.74 | bwd_inner_microstep: 5082.63 | bwd_allreduce_microstep: 32.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-31 15:00:31,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.81 | bwd_microstep: 5197.67 | bwd_inner_microstep: 5139.54 | bwd_allreduce_microstep: 58.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-07-31 15:00:40,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.57 | bwd_microstep: 5112.27 | bwd_inner_microstep: 5077.38 | bwd_allreduce_microstep: 34.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 15:00:48,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3179.11 | bwd_microstep: 4791.56 | bwd_inner_microstep: 4752.36 | bwd_allreduce_microstep: 39.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 15:00:56,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3209.73 | bwd_microstep: 4733.83 | bwd_inner_microstep: 4707.13 | bwd_allreduce_microstep: 26.63 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3683 [2024-07-31 15:01:05,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.31 | bwd_microstep: 5007.89 | bwd_inner_microstep: 4937.81 | bwd_allreduce_microstep: 70.01 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3181 [2024-07-31 15:01:14,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 15:01:14,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.29 | bwd_microstep: 5171.10 | bwd_inner_microstep: 4919.38 | bwd_allreduce_microstep: 251.64 | step_microstep: 181.61 [2024-07-31 15:01:14,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28347.93 | bwd: 40670.68 | bwd_inner: 40068.60 | bwd_allreduce: 601.59 | step: 182.21 53%|█████▎ | 652/1230 [12:49:20<11:10:12, 69.57s/it] {'loss': 1.1812, 'learning_rate': 9.513021985304399e-06, 'epoch': 0.53} 53%|█████▎ | 652/1230 [12:49:20<11:10:12, 69.57s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4014 [2024-07-31 15:01:23,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.13 | bwd_microstep: 5480.31 | bwd_inner_microstep: 5421.70 | bwd_allreduce_microstep: 58.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3816 [2024-07-31 15:01:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3805.22 | bwd_microstep: 5173.50 | bwd_inner_microstep: 5137.11 | bwd_allreduce_microstep: 36.32 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2854 [2024-07-31 15:01:41,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.29 | bwd_microstep: 5327.93 | bwd_inner_microstep: 4913.23 | bwd_allreduce_microstep: 414.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-31 15:01:50,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.08 | bwd_microstep: 5181.85 | bwd_inner_microstep: 5099.37 | bwd_allreduce_microstep: 82.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 15:01:59,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.35 | bwd_microstep: 5270.59 | bwd_inner_microstep: 4863.05 | bwd_allreduce_microstep: 407.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-31 15:02:07,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.71 | bwd_microstep: 4978.83 | bwd_inner_microstep: 4959.49 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 15:02:16,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.03 | bwd_microstep: 5054.52 | bwd_inner_microstep: 4990.67 | bwd_allreduce_microstep: 63.79 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 15:02:25,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.78 [2024-07-31 15:02:25,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.25 | bwd_microstep: 5070.52 | bwd_inner_microstep: 5011.83 | bwd_allreduce_microstep: 58.62 | step_microstep: 181.96 [2024-07-31 15:02:25,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29224.95 | bwd: 41538.03 | bwd_inner: 40396.37 | bwd_allreduce: 1141.17 | step: 182.55 53%|█████▎ | 653/1230 [12:50:31<11:13:27, 70.03s/it] {'loss': 1.1698, 'learning_rate': 9.486721396254482e-06, 'epoch': 0.53} 53%|█████▎ | 653/1230 [12:50:31<11:13:27, 70.03s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2340 [2024-07-31 15:02:34,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.82 | bwd_microstep: 5573.20 | bwd_inner_microstep: 5142.29 | bwd_allreduce_microstep: 430.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3856 [2024-07-31 15:02:43,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.56 | bwd_microstep: 5293.60 | bwd_inner_microstep: 5227.58 | bwd_allreduce_microstep: 65.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-31 15:02:52,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.28 | bwd_microstep: 5121.72 | bwd_inner_microstep: 5047.05 | bwd_allreduce_microstep: 74.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-31 15:03:01,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.98 | bwd_microstep: 4999.61 | bwd_inner_microstep: 4980.31 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-31 15:03:09,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.29 | bwd_microstep: 5012.34 | bwd_inner_microstep: 4956.68 | bwd_allreduce_microstep: 55.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 15:03:18,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.34 | bwd_microstep: 5138.11 | bwd_inner_microstep: 4739.49 | bwd_allreduce_microstep: 398.55 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 15:03:27,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.64 | bwd_microstep: 5054.53 | bwd_inner_microstep: 4992.72 | bwd_allreduce_microstep: 61.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 15:03:35,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 15:03:35,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.72 | bwd_microstep: 5039.15 | bwd_inner_microstep: 4646.87 | bwd_allreduce_microstep: 392.22 | step_microstep: 181.92 [2024-07-31 15:03:35,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28756.53 | bwd: 41232.26 | bwd_inner: 39732.94 | bwd_allreduce: 1498.83 | step: 182.63 53%|█████▎ | 654/1230 [12:51:41<11:13:07, 70.12s/it] {'loss': 1.1751, 'learning_rate': 9.460424366563356e-06, 'epoch': 0.53} 53%|█████▎ | 654/1230 [12:51:41<11:13:07, 70.12s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3901 [2024-07-31 15:03:45,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.30 | bwd_microstep: 5547.04 | bwd_inner_microstep: 5466.55 | bwd_allreduce_microstep: 80.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3816 [2024-07-31 15:03:53,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.91 | bwd_microstep: 5075.40 | bwd_inner_microstep: 5052.53 | bwd_allreduce_microstep: 22.80 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-07-31 15:04:02,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.99 | bwd_microstep: 5017.64 | bwd_inner_microstep: 4982.67 | bwd_allreduce_microstep: 34.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 15:04:11,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.05 | bwd_microstep: 5148.44 | bwd_inner_microstep: 5072.99 | bwd_allreduce_microstep: 75.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 15:04:20,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.43 | bwd_microstep: 5175.14 | bwd_inner_microstep: 5094.28 | bwd_allreduce_microstep: 80.78 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3693 [2024-07-31 15:04:28,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.69 | bwd_microstep: 5229.07 | bwd_inner_microstep: 5135.93 | bwd_allreduce_microstep: 93.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 15:04:37,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.34 | bwd_microstep: 5189.72 | bwd_inner_microstep: 5112.10 | bwd_allreduce_microstep: 77.56 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2110 [2024-07-31 15:04:46,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 15:04:46,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3239.45 | bwd_microstep: 4977.15 | bwd_inner_microstep: 4589.86 | bwd_allreduce_microstep: 387.23 | step_microstep: 182.62 [2024-07-31 15:04:46,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28764.07 | bwd: 41359.58 | bwd_inner: 40506.84 | bwd_allreduce: 852.26 | step: 183.21 53%|█████▎ | 655/1230 [12:52:52<11:12:55, 70.22s/it] {'loss': 1.16, 'learning_rate': 9.434131078589224e-06, 'epoch': 0.53} 53%|█████▎ | 655/1230 [12:52:52<11:12:55, 70.22s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3984 [2024-07-31 15:04:55,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3829.37 | bwd_microstep: 5345.63 | bwd_inner_microstep: 5313.34 | bwd_allreduce_microstep: 32.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3565 [2024-07-31 15:05:04,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.72 | bwd_microstep: 5215.17 | bwd_inner_microstep: 5124.29 | bwd_allreduce_microstep: 90.82 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3971 [2024-07-31 15:05:13,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3819.32 | bwd_microstep: 5250.34 | bwd_inner_microstep: 5230.89 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3773 [2024-07-31 15:05:21,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3459.21 | bwd_microstep: 5121.10 | bwd_inner_microstep: 5076.72 | bwd_allreduce_microstep: 44.31 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3753 [2024-07-31 15:05:30,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.81 | bwd_microstep: 5123.68 | bwd_inner_microstep: 5056.28 | bwd_allreduce_microstep: 67.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-31 15:05:39,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.04 | bwd_microstep: 5199.96 | bwd_inner_microstep: 4796.76 | bwd_allreduce_microstep: 403.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 15:05:48,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.27 | bwd_microstep: 5088.59 | bwd_inner_microstep: 5027.41 | bwd_allreduce_microstep: 61.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 15:05:56,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 15:05:56,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.80 | bwd_microstep: 5040.19 | bwd_inner_microstep: 4982.57 | bwd_allreduce_microstep: 57.55 | step_microstep: 181.82 [2024-07-31 15:05:56,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28973.45 | bwd: 41384.64 | bwd_inner: 40608.19 | bwd_allreduce: 775.95 | step: 182.41 53%|█████▎ | 656/1230 [12:54:02<11:13:07, 70.36s/it] {'loss': 1.1353, 'learning_rate': 9.407841714664341e-06, 'epoch': 0.53} 53%|█████▎ | 656/1230 [12:54:02<11:13:07, 70.36s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3938 [2024-07-31 15:06:06,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.52 | bwd_microstep: 5631.55 | bwd_inner_microstep: 5538.59 | bwd_allreduce_microstep: 92.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3942 [2024-07-31 15:06:15,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.11 | bwd_microstep: 5222.42 | bwd_inner_microstep: 5181.77 | bwd_allreduce_microstep: 40.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 15:06:24,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.32 | bwd_microstep: 5093.52 | bwd_inner_microstep: 5058.31 | bwd_allreduce_microstep: 35.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3620 [2024-07-31 15:06:32,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.95 | bwd_microstep: 5121.10 | bwd_inner_microstep: 5030.68 | bwd_allreduce_microstep: 90.35 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 15:06:41,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.48 | bwd_microstep: 4976.66 | bwd_inner_microstep: 4945.12 | bwd_allreduce_microstep: 31.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-31 15:06:49,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3181.80 | bwd_microstep: 4711.85 | bwd_inner_microstep: 4685.53 | bwd_allreduce_microstep: 26.26 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3687 [2024-07-31 15:06:57,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3426.12 | bwd_microstep: 4784.71 | bwd_inner_microstep: 4765.33 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3672 [2024-07-31 15:07:06,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 15:07:06,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.86 | bwd_microstep: 5010.14 | bwd_inner_microstep: 4945.10 | bwd_allreduce_microstep: 64.96 | step_microstep: 182.19 [2024-07-31 15:07:06,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28646.06 | bwd: 40551.94 | bwd_inner: 40150.37 | bwd_allreduce: 401.07 | step: 182.88 53%|█████▎ | 657/1230 [12:55:12<11:09:34, 70.11s/it] {'loss': 1.1619, 'learning_rate': 9.381556457093752e-06, 'epoch': 0.53} 53%|█████▎ | 657/1230 [12:55:12<11:09:34, 70.11s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2354 [2024-07-31 15:07:15,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.47 | bwd_microstep: 5519.88 | bwd_inner_microstep: 5097.39 | bwd_allreduce_microstep: 422.42 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2046 [2024-07-31 15:07:24,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.43 | bwd_microstep: 5271.60 | bwd_inner_microstep: 4864.18 | bwd_allreduce_microstep: 407.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-31 15:07:33,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.80 | bwd_microstep: 5029.85 | bwd_inner_microstep: 5010.61 | bwd_allreduce_microstep: 19.17 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 15:07:42,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.12 | bwd_microstep: 5192.88 | bwd_inner_microstep: 5109.26 | bwd_allreduce_microstep: 83.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-31 15:07:50,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.58 | bwd_microstep: 5054.52 | bwd_inner_microstep: 5029.45 | bwd_allreduce_microstep: 24.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 15:07:59,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.47 | bwd_microstep: 5011.05 | bwd_inner_microstep: 4961.16 | bwd_allreduce_microstep: 49.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 15:08:08,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.62 | bwd_microstep: 5262.59 | bwd_inner_microstep: 4857.41 | bwd_allreduce_microstep: 405.11 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3681 [2024-07-31 15:08:17,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 15:08:17,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.73 | bwd_microstep: 5013.13 | bwd_inner_microstep: 4970.50 | bwd_allreduce_microstep: 42.56 | step_microstep: 181.33 [2024-07-31 15:08:17,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29006.12 | bwd: 41355.47 | bwd_inner: 39899.91 | bwd_allreduce: 1455.08 | step: 181.91 53%|█████▎ | 658/1230 [12:56:23<11:10:03, 70.29s/it] {'loss': 1.2014, 'learning_rate': 9.355275488154024e-06, 'epoch': 0.53} 53%|█████▎ | 658/1230 [12:56:23<11:10:03, 70.29s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4044 [2024-07-31 15:08:26,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.99 | bwd_microstep: 5200.62 | bwd_inner_microstep: 5179.88 | bwd_allreduce_microstep: 20.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3881 [2024-07-31 15:08:34,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.31 | bwd_microstep: 5075.78 | bwd_inner_microstep: 5042.17 | bwd_allreduce_microstep: 33.55 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2082 [2024-07-31 15:08:43,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.51 | bwd_microstep: 5263.02 | bwd_inner_microstep: 4853.66 | bwd_allreduce_microstep: 409.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 15:08:52,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.61 | bwd_microstep: 5185.92 | bwd_inner_microstep: 5102.06 | bwd_allreduce_microstep: 83.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3827 [2024-07-31 15:09:01,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3808.26 | bwd_microstep: 5277.00 | bwd_inner_microstep: 5235.25 | bwd_allreduce_microstep: 41.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3710 [2024-07-31 15:09:10,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.68 | bwd_microstep: 5029.42 | bwd_inner_microstep: 4962.35 | bwd_allreduce_microstep: 67.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 15:09:18,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.41 | bwd_microstep: 5110.78 | bwd_inner_microstep: 4714.04 | bwd_allreduce_microstep: 396.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3686 [2024-07-31 15:09:26,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 15:09:26,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3080.76 | bwd_microstep: 4862.71 | bwd_inner_microstep: 4818.12 | bwd_allreduce_microstep: 44.52 | step_microstep: 181.71 [2024-07-31 15:09:26,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28520.44 | bwd: 41005.24 | bwd_inner: 39907.49 | bwd_allreduce: 1097.27 | step: 182.30 54%|█████▎ | 659/1230 [12:57:32<11:07:39, 70.16s/it] {'loss': 1.1892, 'learning_rate': 9.328998990091989e-06, 'epoch': 0.54} 54%|█████▎ | 659/1230 [12:57:32<11:07:39, 70.16s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 15:09:36,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.61 | bwd_microstep: 5554.73 | bwd_inner_microstep: 5421.56 | bwd_allreduce_microstep: 133.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3834 [2024-07-31 15:09:44,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3376.24 | bwd_microstep: 5074.37 | bwd_inner_microstep: 5035.67 | bwd_allreduce_microstep: 38.63 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2065 [2024-07-31 15:09:53,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.75 | bwd_microstep: 5269.71 | bwd_inner_microstep: 4859.86 | bwd_allreduce_microstep: 409.79 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-31 15:10:02,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.10 | bwd_microstep: 5059.79 | bwd_inner_microstep: 4667.07 | bwd_allreduce_microstep: 392.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 15:10:10,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.58 | bwd_microstep: 5075.69 | bwd_inner_microstep: 5014.71 | bwd_allreduce_microstep: 60.91 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3653 [2024-07-31 15:10:19,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.50 | bwd_microstep: 5134.52 | bwd_inner_microstep: 5047.93 | bwd_allreduce_microstep: 86.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 15:10:28,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.80 | bwd_microstep: 5040.61 | bwd_inner_microstep: 4985.76 | bwd_allreduce_microstep: 54.77 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2098 [2024-07-31 15:10:37,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 15:10:37,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.86 | bwd_microstep: 5120.10 | bwd_inner_microstep: 4723.12 | bwd_allreduce_microstep: 396.90 | step_microstep: 181.90 [2024-07-31 15:10:37,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28428.33 | bwd: 41329.51 | bwd_inner: 39755.61 | bwd_allreduce: 1573.40 | step: 182.50 54%|█████▎ | 660/1230 [12:58:42<11:06:17, 70.14s/it] {'loss': 1.2126, 'learning_rate': 9.302727145123466e-06, 'epoch': 0.54} 54%|█████▎ | 660/1230 [12:58:42<11:06:17, 70.14s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3943 [2024-07-31 15:10:46,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3781.92 | bwd_microstep: 5160.76 | bwd_inner_microstep: 5141.69 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3889 [2024-07-31 15:10:55,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.72 | bwd_microstep: 5393.45 | bwd_inner_microstep: 5323.49 | bwd_allreduce_microstep: 69.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-31 15:11:03,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.76 | bwd_microstep: 5138.03 | bwd_inner_microstep: 5087.48 | bwd_allreduce_microstep: 50.48 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3772 [2024-07-31 15:11:12,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.22 | bwd_microstep: 5177.54 | bwd_inner_microstep: 5137.85 | bwd_allreduce_microstep: 39.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-31 15:11:21,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.29 | bwd_microstep: 5038.67 | bwd_inner_microstep: 5014.05 | bwd_allreduce_microstep: 24.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 15:11:29,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3338.52 | bwd_microstep: 4851.70 | bwd_inner_microstep: 4817.89 | bwd_allreduce_microstep: 33.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-07-31 15:11:38,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.29 | bwd_microstep: 5000.48 | bwd_inner_microstep: 4981.19 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-31 15:11:47,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.77 [2024-07-31 15:11:47,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.43 | bwd_microstep: 5064.42 | bwd_inner_microstep: 4998.78 | bwd_allreduce_microstep: 65.57 | step_microstep: 182.18 [2024-07-31 15:11:47,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29183.06 | bwd: 40825.03 | bwd_inner: 40502.36 | bwd_allreduce: 322.19 | step: 182.86 54%|█████▎ | 661/1230 [12:59:53<11:05:43, 70.20s/it] {'loss': 1.1503, 'learning_rate': 9.27646013543202e-06, 'epoch': 0.54} 54%|█████▎ | 661/1230 [12:59:53<11:05:43, 70.20s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3648 [2024-07-31 15:11:56,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.09 | bwd_microstep: 5327.87 | bwd_inner_microstep: 5228.20 | bwd_allreduce_microstep: 99.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2323 [2024-07-31 15:12:05,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.45 | bwd_microstep: 5341.99 | bwd_inner_microstep: 4926.67 | bwd_allreduce_microstep: 415.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 15:12:14,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.09 | bwd_microstep: 4990.66 | bwd_inner_microstep: 4970.88 | bwd_allreduce_microstep: 19.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 15:12:22,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.69 | bwd_microstep: 4989.23 | bwd_inner_microstep: 4969.87 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 15:12:31,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.29 | bwd_microstep: 4857.02 | bwd_inner_microstep: 4811.60 | bwd_allreduce_microstep: 45.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 15:12:39,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3455.34 | bwd_microstep: 5025.99 | bwd_inner_microstep: 4637.16 | bwd_allreduce_microstep: 388.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 15:12:48,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.73 | bwd_microstep: 5258.18 | bwd_inner_microstep: 4849.78 | bwd_allreduce_microstep: 408.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 15:12:57,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 15:12:57,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.52 | bwd_microstep: 4977.08 | bwd_inner_microstep: 4929.28 | bwd_allreduce_microstep: 47.73 | step_microstep: 182.11 [2024-07-31 15:12:57,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28526.12 | bwd: 40767.99 | bwd_inner: 39323.39 | bwd_allreduce: 1444.10 | step: 182.70 54%|█████▍ | 662/1230 [13:01:02<11:02:55, 70.03s/it] {'loss': 1.149, 'learning_rate': 9.250198143167675e-06, 'epoch': 0.54} 54%|█████▍ | 662/1230 [13:01:02<11:02:55, 70.03s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3891 [2024-07-31 15:13:05,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3462.43 | bwd_microstep: 5409.92 | bwd_inner_microstep: 5322.39 | bwd_allreduce_microstep: 87.46 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2040 [2024-07-31 15:13:14,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.38 | bwd_microstep: 5268.70 | bwd_inner_microstep: 4862.90 | bwd_allreduce_microstep: 405.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3770 [2024-07-31 15:13:23,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.55 | bwd_microstep: 5020.06 | bwd_inner_microstep: 5000.73 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3653 [2024-07-31 15:13:32,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.55 | bwd_microstep: 5082.45 | bwd_inner_microstep: 5030.62 | bwd_allreduce_microstep: 51.76 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2857 [2024-07-31 15:13:40,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.30 | bwd_microstep: 5014.35 | bwd_inner_microstep: 4641.94 | bwd_allreduce_microstep: 372.34 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3721 [2024-07-31 15:13:48,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3111.46 | bwd_microstep: 4877.56 | bwd_inner_microstep: 4841.68 | bwd_allreduce_microstep: 35.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 15:13:57,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.24 | bwd_microstep: 5152.43 | bwd_inner_microstep: 5079.42 | bwd_allreduce_microstep: 72.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3704 [2024-07-31 15:14:06,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 15:14:06,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.43 | bwd_microstep: 5025.36 | bwd_inner_microstep: 4954.78 | bwd_allreduce_microstep: 70.51 | step_microstep: 181.29 [2024-07-31 15:14:06,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28267.27 | bwd: 40850.82 | bwd_inner: 39734.40 | bwd_allreduce: 1115.93 | step: 181.87 54%|█████▍ | 663/1230 [13:02:12<11:00:06, 69.85s/it] {'loss': 1.1251, 'learning_rate': 9.223941350445666e-06, 'epoch': 0.54} 54%|█████▍ | 663/1230 [13:02:12<11:00:06, 69.85s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2298 [2024-07-31 15:14:15,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3170.88 | bwd_microstep: 5417.51 | bwd_inner_microstep: 5005.59 | bwd_allreduce_microstep: 411.84 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3866 [2024-07-31 15:14:24,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.52 | bwd_microstep: 5290.89 | bwd_inner_microstep: 5233.66 | bwd_allreduce_microstep: 57.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-31 15:14:32,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.91 | bwd_microstep: 5177.00 | bwd_inner_microstep: 4774.37 | bwd_allreduce_microstep: 402.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 15:14:41,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.91 | bwd_microstep: 5181.58 | bwd_inner_microstep: 4778.44 | bwd_allreduce_microstep: 403.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 15:14:50,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.99 | bwd_microstep: 5033.71 | bwd_inner_microstep: 5005.03 | bwd_allreduce_microstep: 28.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-31 15:14:59,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.33 | bwd_microstep: 5190.51 | bwd_inner_microstep: 5112.51 | bwd_allreduce_microstep: 77.93 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2170 [2024-07-31 15:15:07,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3445.54 | bwd_microstep: 5028.47 | bwd_inner_microstep: 4639.85 | bwd_allreduce_microstep: 388.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3668 [2024-07-31 15:15:16,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 15:15:16,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3392.62 | bwd_microstep: 4949.63 | bwd_inner_microstep: 4882.60 | bwd_allreduce_microstep: 66.95 | step_microstep: 181.85 [2024-07-31 15:15:16,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28115.60 | bwd: 41269.28 | bwd_inner: 39431.97 | bwd_allreduce: 1836.81 | step: 182.43 54%|█████▍ | 664/1230 [13:03:22<10:58:33, 69.81s/it] {'loss': 1.1395, 'learning_rate': 9.19768993934517e-06, 'epoch': 0.54} 54%|█████▍ | 664/1230 [13:03:22<10:58:33, 69.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3750 [2024-07-31 15:15:25,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.97 | bwd_microstep: 5318.94 | bwd_inner_microstep: 5215.73 | bwd_allreduce_microstep: 103.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-31 15:15:33,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3262.91 | bwd_microstep: 5060.65 | bwd_inner_microstep: 4986.07 | bwd_allreduce_microstep: 74.51 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3788 [2024-07-31 15:15:42,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.09 | bwd_microstep: 5272.72 | bwd_inner_microstep: 5209.28 | bwd_allreduce_microstep: 63.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 15:15:51,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.35 | bwd_microstep: 5023.05 | bwd_inner_microstep: 4998.69 | bwd_allreduce_microstep: 24.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 15:15:59,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.48 | bwd_microstep: 5076.85 | bwd_inner_microstep: 5013.31 | bwd_allreduce_microstep: 63.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-31 15:16:08,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.14 | bwd_microstep: 5123.64 | bwd_inner_microstep: 4725.52 | bwd_allreduce_microstep: 398.06 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 15:16:17,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.11 | bwd_microstep: 5053.17 | bwd_inner_microstep: 4986.14 | bwd_allreduce_microstep: 66.96 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2141 [2024-07-31 15:16:25,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 15:16:25,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2979.32 | bwd_microstep: 4929.30 | bwd_inner_microstep: 4549.36 | bwd_allreduce_microstep: 379.86 | step_microstep: 182.90 [2024-07-31 15:16:25,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27959.27 | bwd: 40858.30 | bwd_inner: 39684.04 | bwd_allreduce: 1173.78 | step: 183.60 54%|█████▍ | 665/1230 [13:04:31<10:55:31, 69.61s/it] {'loss': 1.116, 'learning_rate': 9.171444091908046e-06, 'epoch': 0.54} 54%|█████▍ | 665/1230 [13:04:31<10:55:31, 69.61s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3820 [2024-07-31 15:16:34,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.96 | bwd_microstep: 5287.67 | bwd_inner_microstep: 5222.87 | bwd_allreduce_microstep: 64.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 15:16:43,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.26 | bwd_microstep: 5073.69 | bwd_inner_microstep: 5023.93 | bwd_allreduce_microstep: 49.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 15:16:52,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.38 | bwd_microstep: 5314.10 | bwd_inner_microstep: 5235.17 | bwd_allreduce_microstep: 78.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-31 15:17:00,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.69 | bwd_microstep: 5156.72 | bwd_inner_microstep: 5101.98 | bwd_allreduce_microstep: 54.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 15:17:08,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2988.17 | bwd_microstep: 4837.08 | bwd_inner_microstep: 4464.56 | bwd_allreduce_microstep: 372.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 15:17:16,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3044.43 | bwd_microstep: 5061.08 | bwd_inner_microstep: 4671.77 | bwd_allreduce_microstep: 389.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 15:17:25,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.15 | bwd_microstep: 4992.57 | bwd_inner_microstep: 4957.06 | bwd_allreduce_microstep: 35.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3778 [2024-07-31 15:17:34,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.81 [2024-07-31 15:17:34,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.43 | bwd_microstep: 4884.98 | bwd_inner_microstep: 4863.03 | bwd_allreduce_microstep: 21.88 | step_microstep: 182.32 [2024-07-31 15:17:34,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28027.38 | bwd: 40607.88 | bwd_inner: 39540.33 | bwd_allreduce: 1067.06 | step: 182.90 54%|█████▍ | 666/1230 [13:05:40<10:52:33, 69.42s/it] {'loss': 1.1682, 'learning_rate': 9.14520399013757e-06, 'epoch': 0.54} 54%|█████▍ | 666/1230 [13:05:40<10:52:33, 69.42s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3994 [2024-07-31 15:17:43,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3858.01 | bwd_microstep: 5286.69 | bwd_inner_microstep: 5267.45 | bwd_allreduce_microstep: 19.17 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2814 [2024-07-31 15:17:52,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.65 | bwd_microstep: 5353.15 | bwd_inner_microstep: 4939.75 | bwd_allreduce_microstep: 413.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2214 [2024-07-31 15:18:01,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.05 | bwd_microstep: 5231.91 | bwd_inner_microstep: 4821.35 | bwd_allreduce_microstep: 410.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-31 15:18:10,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.25 | bwd_microstep: 5117.67 | bwd_inner_microstep: 5070.40 | bwd_allreduce_microstep: 47.20 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3618 [2024-07-31 15:18:18,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.20 | bwd_microstep: 5130.03 | bwd_inner_microstep: 5064.68 | bwd_allreduce_microstep: 65.29 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2103 [2024-07-31 15:18:27,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.80 | bwd_microstep: 5182.37 | bwd_inner_microstep: 4779.35 | bwd_allreduce_microstep: 402.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3680 [2024-07-31 15:18:36,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.42 | bwd_microstep: 4976.69 | bwd_inner_microstep: 4941.51 | bwd_allreduce_microstep: 35.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3691 [2024-07-31 15:18:45,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 15:18:45,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.26 | bwd_microstep: 5063.11 | bwd_inner_microstep: 4986.05 | bwd_allreduce_microstep: 76.99 | step_microstep: 181.64 [2024-07-31 15:18:45,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29104.55 | bwd: 41341.60 | bwd_inner: 39870.48 | bwd_allreduce: 1470.64 | step: 182.22 54%|█████▍ | 667/1230 [13:06:51<10:55:13, 69.83s/it] {'loss': 1.1704, 'learning_rate': 9.118969815997171e-06, 'epoch': 0.54} 54%|█████▍ | 667/1230 [13:06:51<10:55:13, 69.83s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3896 [2024-07-31 15:18:54,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3810.53 | bwd_microstep: 5129.23 | bwd_inner_microstep: 5110.15 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2064 [2024-07-31 15:19:02,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3051.89 | bwd_microstep: 5091.76 | bwd_inner_microstep: 4701.51 | bwd_allreduce_microstep: 390.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-07-31 15:19:10,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.33 | bwd_microstep: 4805.15 | bwd_inner_microstep: 4764.72 | bwd_allreduce_microstep: 40.35 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-31 15:19:19,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.18 | bwd_microstep: 5206.13 | bwd_inner_microstep: 4801.27 | bwd_allreduce_microstep: 404.79 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3650 [2024-07-31 15:19:27,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.95 | bwd_microstep: 4961.12 | bwd_inner_microstep: 4895.68 | bwd_allreduce_microstep: 65.36 | step_microstep: 0.11 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 15:19:36,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.43 | bwd_microstep: 4917.41 | bwd_inner_microstep: 4898.08 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 15:19:44,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.78 | bwd_microstep: 4801.15 | bwd_inner_microstep: 4763.82 | bwd_allreduce_microstep: 37.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 15:19:52,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 15:19:53,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.69 | bwd_microstep: 5108.52 | bwd_inner_microstep: 4713.22 | bwd_allreduce_microstep: 395.22 | step_microstep: 181.54 [2024-07-31 15:19:53,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27540.69 | bwd: 40020.43 | bwd_inner: 38648.39 | bwd_allreduce: 1371.52 | step: 182.16 54%|█████▍ | 668/1230 [13:07:58<10:48:36, 69.25s/it] {'loss': 1.1565, 'learning_rate': 9.092741751409188e-06, 'epoch': 0.54} 54%|█████▍ | 668/1230 [13:07:58<10:48:36, 69.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3981 [2024-07-31 15:20:01,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.83 | bwd_microstep: 5176.34 | bwd_inner_microstep: 5151.73 | bwd_allreduce_microstep: 24.55 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3785 [2024-07-31 15:20:10,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.45 | bwd_microstep: 5134.53 | bwd_inner_microstep: 5070.31 | bwd_allreduce_microstep: 64.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-07-31 15:20:19,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.58 | bwd_microstep: 5249.50 | bwd_inner_microstep: 4843.32 | bwd_allreduce_microstep: 406.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 15:20:28,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.72 | bwd_microstep: 5176.42 | bwd_inner_microstep: 5095.29 | bwd_allreduce_microstep: 81.06 | step_microstep: 0.10 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2885 [2024-07-31 15:20:37,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.46 | bwd_microstep: 5197.83 | bwd_inner_microstep: 4792.61 | bwd_allreduce_microstep: 405.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 15:20:45,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.88 | bwd_microstep: 5018.07 | bwd_inner_microstep: 4960.63 | bwd_allreduce_microstep: 57.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 15:20:53,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3006.36 | bwd_microstep: 4888.09 | bwd_inner_microstep: 4512.83 | bwd_allreduce_microstep: 375.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 15:21:02,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 15:21:02,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3474.85 | bwd_microstep: 5075.74 | bwd_inner_microstep: 4681.75 | bwd_allreduce_microstep: 393.92 | step_microstep: 181.68 [2024-07-31 15:21:02,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28137.03 | bwd: 40916.49 | bwd_inner: 39108.41 | bwd_allreduce: 1807.60 | step: 182.26 54%|█████▍ | 669/1230 [13:09:08<10:47:49, 69.29s/it] {'loss': 1.1694, 'learning_rate': 9.06651997825357e-06, 'epoch': 0.54} 54%|█████▍ | 669/1230 [13:09:08<10:47:49, 69.29s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 15:21:11,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.75 | bwd_microstep: 5629.02 | bwd_inner_microstep: 5463.37 | bwd_allreduce_microstep: 165.59 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3828 [2024-07-31 15:21:20,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.61 | bwd_microstep: 5161.86 | bwd_inner_microstep: 5115.47 | bwd_allreduce_microstep: 46.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-31 15:21:29,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.00 | bwd_microstep: 5207.36 | bwd_inner_microstep: 4802.16 | bwd_allreduce_microstep: 405.13 | step_microstep: 0.09 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1231 [2024-07-31 15:21:38,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.36 | bwd_microstep: 5253.65 | bwd_inner_microstep: 4844.90 | bwd_allreduce_microstep: 408.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 15:21:46,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.10 | bwd_microstep: 4980.78 | bwd_inner_microstep: 4961.36 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 15:21:55,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.70 | bwd_microstep: 5117.70 | bwd_inner_microstep: 5048.45 | bwd_allreduce_microstep: 69.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 15:22:04,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.37 | bwd_microstep: 5068.84 | bwd_inner_microstep: 5003.06 | bwd_allreduce_microstep: 65.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3663 [2024-07-31 15:22:13,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 15:22:13,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.87 | bwd_microstep: 4982.44 | bwd_inner_microstep: 4944.44 | bwd_allreduce_microstep: 37.92 | step_microstep: 182.17 [2024-07-31 15:22:13,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28990.68 | bwd: 41401.63 | bwd_inner: 40183.15 | bwd_allreduce: 1218.00 | step: 182.76 54%|█████▍ | 670/1230 [13:10:18<10:50:42, 69.72s/it] {'loss': 1.1667, 'learning_rate': 9.040304678366658e-06, 'epoch': 0.54} 54%|█████▍ | 670/1230 [13:10:18<10:50:42, 69.72s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2426 [2024-07-31 15:22:22,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.31 | bwd_microstep: 5313.27 | bwd_inner_microstep: 4904.64 | bwd_allreduce_microstep: 408.55 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2290 [2024-07-31 15:22:30,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.94 | bwd_microstep: 5296.44 | bwd_inner_microstep: 4888.30 | bwd_allreduce_microstep: 408.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3824 [2024-07-31 15:22:39,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.94 | bwd_microstep: 5118.53 | bwd_inner_microstep: 5075.27 | bwd_allreduce_microstep: 43.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-31 15:22:48,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.72 | bwd_microstep: 5204.46 | bwd_inner_microstep: 4801.38 | bwd_allreduce_microstep: 403.01 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-31 15:22:57,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.88 | bwd_microstep: 5110.76 | bwd_inner_microstep: 5065.52 | bwd_allreduce_microstep: 45.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 15:23:05,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.14 | bwd_microstep: 5053.44 | bwd_inner_microstep: 4992.57 | bwd_allreduce_microstep: 60.79 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 15:23:14,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.98 | bwd_microstep: 5109.78 | bwd_inner_microstep: 4712.32 | bwd_allreduce_microstep: 397.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3693 [2024-07-31 15:23:23,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 15:23:23,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.65 | bwd_microstep: 5057.37 | bwd_inner_microstep: 4982.18 | bwd_allreduce_microstep: 75.12 | step_microstep: 181.45 [2024-07-31 15:23:23,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28603.43 | bwd: 41264.03 | bwd_inner: 39422.12 | bwd_allreduce: 1841.42 | step: 182.03 55%|█████▍ | 671/1230 [13:11:29<10:50:52, 69.86s/it] {'loss': 1.1558, 'learning_rate': 9.014096033539887e-06, 'epoch': 0.55} 55%|█████▍ | 671/1230 [13:11:29<10:50:52, 69.86s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3888 [2024-07-31 15:23:32,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.40 | bwd_microstep: 5232.21 | bwd_inner_microstep: 5189.59 | bwd_allreduce_microstep: 42.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-31 15:23:41,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.95 | bwd_microstep: 5536.58 | bwd_inner_microstep: 5105.62 | bwd_allreduce_microstep: 430.89 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2832 [2024-07-31 15:23:49,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3065.58 | bwd_microstep: 4986.52 | bwd_inner_microstep: 4624.19 | bwd_allreduce_microstep: 362.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-31 15:23:58,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.14 | bwd_microstep: 5286.39 | bwd_inner_microstep: 4878.16 | bwd_allreduce_microstep: 408.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 15:24:07,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.13 | bwd_microstep: 5145.70 | bwd_inner_microstep: 5071.67 | bwd_allreduce_microstep: 73.96 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3664 [2024-07-31 15:24:15,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.00 | bwd_microstep: 4972.61 | bwd_inner_microstep: 4912.17 | bwd_allreduce_microstep: 60.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 15:24:24,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.96 | bwd_microstep: 5012.41 | bwd_inner_microstep: 4960.79 | bwd_allreduce_microstep: 51.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 15:24:33,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 15:24:33,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.48 | bwd_microstep: 5083.06 | bwd_inner_microstep: 5023.97 | bwd_allreduce_microstep: 59.03 | step_microstep: 182.87 [2024-07-31 15:24:33,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28110.56 | bwd: 41255.47 | bwd_inner: 39766.09 | bwd_allreduce: 1488.90 | step: 183.58 55%|█████▍ | 672/1230 [13:12:38<10:49:15, 69.81s/it] {'loss': 1.152, 'learning_rate': 8.987894225518556e-06, 'epoch': 0.55} 55%|█████▍ | 672/1230 [13:12:38<10:49:15, 69.81s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3953 [2024-07-31 15:24:42,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3877.97 | bwd_microstep: 5477.46 | bwd_inner_microstep: 5421.43 | bwd_allreduce_microstep: 55.95 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3824 [2024-07-31 15:24:51,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.67 | bwd_microstep: 5156.00 | bwd_inner_microstep: 5110.83 | bwd_allreduce_microstep: 45.11 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2244 [2024-07-31 15:24:58,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2550.40 | bwd_microstep: 4849.20 | bwd_inner_microstep: 4476.42 | bwd_allreduce_microstep: 372.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-31 15:25:07,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.30 | bwd_microstep: 5174.12 | bwd_inner_microstep: 4772.74 | bwd_allreduce_microstep: 401.31 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2055 [2024-07-31 15:25:15,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3153.50 | bwd_microstep: 5018.52 | bwd_inner_microstep: 4631.87 | bwd_allreduce_microstep: 386.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 15:25:24,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.52 | bwd_microstep: 5185.80 | bwd_inner_microstep: 5105.68 | bwd_allreduce_microstep: 80.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 15:25:32,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.47 | bwd_microstep: 5003.11 | bwd_inner_microstep: 4943.92 | bwd_allreduce_microstep: 59.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2135 [2024-07-31 15:25:41,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-31 15:25:41,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3015.26 | bwd_microstep: 4924.91 | bwd_inner_microstep: 4547.69 | bwd_allreduce_microstep: 377.16 | step_microstep: 181.53 [2024-07-31 15:25:41,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 26884.00 | bwd: 40789.10 | bwd_inner: 39010.52 | bwd_allreduce: 1778.09 | step: 182.11 55%|█████▍ | 673/1230 [13:13:46<10:43:02, 69.27s/it] {'loss': 1.1772, 'learning_rate': 8.961699436000547e-06, 'epoch': 0.55} 55%|█████▍ | 673/1230 [13:13:46<10:43:02, 69.27s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3967 [2024-07-31 15:25:50,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3846.96 | bwd_microstep: 5366.37 | bwd_inner_microstep: 5321.13 | bwd_allreduce_microstep: 45.16 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3568 [2024-07-31 15:25:59,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.00 | bwd_microstep: 5177.48 | bwd_inner_microstep: 5102.29 | bwd_allreduce_microstep: 75.13 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2815 [2024-07-31 15:26:07,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.08 | bwd_microstep: 5229.78 | bwd_inner_microstep: 4822.91 | bwd_allreduce_microstep: 406.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3598 [2024-07-31 15:26:16,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.95 | bwd_microstep: 5162.87 | bwd_inner_microstep: 5080.01 | bwd_allreduce_microstep: 82.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 15:26:25,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.59 | bwd_microstep: 4963.38 | bwd_inner_microstep: 4926.87 | bwd_allreduce_microstep: 36.44 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3760 [2024-07-31 15:26:34,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.11 | bwd_microstep: 5183.88 | bwd_inner_microstep: 5119.84 | bwd_allreduce_microstep: 63.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 15:26:42,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.70 | bwd_microstep: 5070.45 | bwd_inner_microstep: 5010.51 | bwd_allreduce_microstep: 59.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 15:26:51,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 15:26:51,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.28 | bwd_microstep: 5178.10 | bwd_inner_microstep: 4775.83 | bwd_allreduce_microstep: 402.20 | step_microstep: 181.90 [2024-07-31 15:26:51,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28984.57 | bwd: 41332.28 | bwd_inner: 40159.35 | bwd_allreduce: 1172.45 | step: 182.48 55%|█████▍ | 674/1230 [13:14:57<10:45:45, 69.69s/it] {'loss': 1.2031, 'learning_rate': 8.935511846635072e-06, 'epoch': 0.55} 55%|█████▍ | 674/1230 [13:14:57<10:45:45, 69.69s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3766 [2024-07-31 15:27:00,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.35 | bwd_microstep: 5492.28 | bwd_inner_microstep: 5399.89 | bwd_allreduce_microstep: 92.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 15:27:09,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3270.94 | bwd_microstep: 4952.06 | bwd_inner_microstep: 4923.08 | bwd_allreduce_microstep: 28.91 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3589 [2024-07-31 15:27:17,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.06 | bwd_microstep: 5177.37 | bwd_inner_microstep: 5080.59 | bwd_allreduce_microstep: 96.71 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 15:27:26,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.65 | bwd_microstep: 4861.91 | bwd_inner_microstep: 4811.49 | bwd_allreduce_microstep: 50.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 15:27:34,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.08 | bwd_microstep: 5114.67 | bwd_inner_microstep: 5053.53 | bwd_allreduce_microstep: 61.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 15:27:43,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.65 | bwd_microstep: 5156.34 | bwd_inner_microstep: 5082.24 | bwd_allreduce_microstep: 74.03 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2136 [2024-07-31 15:27:52,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.10 | bwd_microstep: 5237.54 | bwd_inner_microstep: 4830.57 | bwd_allreduce_microstep: 406.91 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2124 [2024-07-31 15:28:01,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 15:28:01,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.67 | bwd_microstep: 5122.20 | bwd_inner_microstep: 4724.35 | bwd_allreduce_microstep: 397.78 | step_microstep: 182.27 [2024-07-31 15:28:01,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28066.39 | bwd: 41114.36 | bwd_inner: 39905.69 | bwd_allreduce: 1208.19 | step: 182.89 55%|█████▍ | 675/1230 [13:16:07<10:44:05, 69.63s/it] {'loss': 1.1352, 'learning_rate': 8.909331639021414e-06, 'epoch': 0.55} 55%|█████▍ | 675/1230 [13:16:07<10:44:05, 69.63s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2367 [2024-07-31 15:28:10,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.73 | bwd_microstep: 5295.78 | bwd_inner_microstep: 4890.28 | bwd_allreduce_microstep: 405.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-07-31 15:28:18,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.42 | bwd_microstep: 5102.83 | bwd_inner_microstep: 5052.92 | bwd_allreduce_microstep: 49.84 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2245 [2024-07-31 15:28:27,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.55 | bwd_microstep: 5233.06 | bwd_inner_microstep: 4825.88 | bwd_allreduce_microstep: 407.12 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2092 [2024-07-31 15:28:35,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3054.56 | bwd_microstep: 5042.12 | bwd_inner_microstep: 4653.64 | bwd_allreduce_microstep: 388.41 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 1610 [2024-07-31 15:28:44,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.33 | bwd_microstep: 5244.31 | bwd_inner_microstep: 4839.37 | bwd_allreduce_microstep: 404.87 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 15:28:53,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.51 | bwd_microstep: 4986.11 | bwd_inner_microstep: 4949.62 | bwd_allreduce_microstep: 36.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 15:29:01,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.59 | bwd_microstep: 5360.33 | bwd_inner_microstep: 4862.28 | bwd_allreduce_microstep: 497.98 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3676 [2024-07-31 15:29:10,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 15:29:10,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.73 | bwd_microstep: 5139.61 | bwd_inner_microstep: 5050.21 | bwd_allreduce_microstep: 89.34 | step_microstep: 181.56 [2024-07-31 15:29:10,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27965.32 | bwd: 41404.14 | bwd_inner: 39124.13 | bwd_allreduce: 2279.51 | step: 182.27 55%|█████▍ | 676/1230 [13:17:16<10:43:06, 69.65s/it] {'loss': 1.185, 'learning_rate': 8.883158994707668e-06, 'epoch': 0.55} 55%|█████▍ | 676/1230 [13:17:16<10:43:06, 69.65s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4057 [2024-07-31 15:29:20,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3822.19 | bwd_microstep: 5309.41 | bwd_inner_microstep: 5290.28 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2065 [2024-07-31 15:29:28,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.75 | bwd_microstep: 5198.41 | bwd_inner_microstep: 4798.59 | bwd_allreduce_microstep: 399.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3769 [2024-07-31 15:29:37,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.42 | bwd_microstep: 4993.29 | bwd_inner_microstep: 4973.91 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-07-31 15:29:46,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.22 | bwd_microstep: 5034.54 | bwd_inner_microstep: 5015.19 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-31 15:29:55,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.85 | bwd_microstep: 5135.44 | bwd_inner_microstep: 5061.40 | bwd_allreduce_microstep: 73.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-31 15:30:03,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.92 | bwd_microstep: 5187.02 | bwd_inner_microstep: 5100.85 | bwd_allreduce_microstep: 86.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 15:30:12,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.93 | bwd_microstep: 4975.30 | bwd_inner_microstep: 4926.67 | bwd_allreduce_microstep: 48.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 15:30:21,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 15:30:21,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.16 | bwd_microstep: 5022.61 | bwd_inner_microstep: 4966.98 | bwd_allreduce_microstep: 55.57 | step_microstep: 181.91 [2024-07-31 15:30:21,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29131.36 | bwd: 40856.00 | bwd_inner: 40133.81 | bwd_allreduce: 721.69 | step: 182.51 55%|█████▌ | 677/1230 [13:18:27<10:43:47, 69.85s/it] {'loss': 1.1511, 'learning_rate': 8.856994095189477e-06, 'epoch': 0.55} 55%|█████▌ | 677/1230 [13:18:27<10:43:47, 69.85s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3931 [2024-07-31 15:30:30,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.80 | bwd_microstep: 5630.30 | bwd_inner_microstep: 5533.39 | bwd_allreduce_microstep: 96.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-07-31 15:30:39,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.23 | bwd_microstep: 5568.21 | bwd_inner_microstep: 5386.14 | bwd_allreduce_microstep: 182.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 15:30:48,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.24 | bwd_microstep: 5180.58 | bwd_inner_microstep: 4779.31 | bwd_allreduce_microstep: 401.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-31 15:30:57,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.93 | bwd_microstep: 5285.31 | bwd_inner_microstep: 5217.75 | bwd_allreduce_microstep: 67.48 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 15:31:06,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.90 | bwd_microstep: 5066.97 | bwd_inner_microstep: 5009.11 | bwd_allreduce_microstep: 57.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3661 [2024-07-31 15:31:15,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.68 | bwd_microstep: 5180.34 | bwd_inner_microstep: 5090.15 | bwd_allreduce_microstep: 90.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-31 15:31:23,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.24 | bwd_microstep: 5172.81 | bwd_inner_microstep: 5090.08 | bwd_allreduce_microstep: 82.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-31 15:31:32,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.74 [2024-07-31 15:31:32,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.40 | bwd_microstep: 5016.16 | bwd_inner_microstep: 4996.80 | bwd_allreduce_microstep: 19.29 | step_microstep: 181.63 [2024-07-31 15:31:32,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29180.33 | bwd: 42100.66 | bwd_inner: 41102.67 | bwd_allreduce: 997.48 | step: 182.22 55%|█████▌ | 678/1230 [13:19:38<10:47:29, 70.38s/it] {'loss': 1.1147, 'learning_rate': 8.830837121908781e-06, 'epoch': 0.55} 55%|█████▌ | 678/1230 [13:19:38<10:47:29, 70.38s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3806 [2024-07-31 15:31:41,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.94 | bwd_microstep: 5196.67 | bwd_inner_microstep: 5148.25 | bwd_allreduce_microstep: 48.35 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2226 [2024-07-31 15:31:49,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3032.05 | bwd_microstep: 5001.09 | bwd_inner_microstep: 4615.90 | bwd_allreduce_microstep: 385.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 15:31:58,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.83 | bwd_microstep: 5134.09 | bwd_inner_microstep: 5060.85 | bwd_allreduce_microstep: 73.17 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-31 15:32:06,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.85 | bwd_microstep: 5047.99 | bwd_inner_microstep: 4655.72 | bwd_allreduce_microstep: 392.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-31 15:32:15,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.64 | bwd_microstep: 4991.48 | bwd_inner_microstep: 4959.08 | bwd_allreduce_microstep: 32.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-31 15:32:24,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.80 | bwd_microstep: 5001.90 | bwd_inner_microstep: 4952.02 | bwd_allreduce_microstep: 49.81 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3606 [2024-07-31 15:32:32,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.54 | bwd_microstep: 5045.63 | bwd_inner_microstep: 4981.90 | bwd_allreduce_microstep: 63.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 15:32:41,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 15:32:41,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.51 | bwd_microstep: 5057.75 | bwd_inner_microstep: 4666.35 | bwd_allreduce_microstep: 391.33 | step_microstep: 181.76 [2024-07-31 15:32:41,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27950.07 | bwd: 40476.58 | bwd_inner: 39040.01 | bwd_allreduce: 1436.07 | step: 182.36 55%|█████▌ | 679/1230 [13:20:47<10:41:51, 69.89s/it] {'loss': 1.1575, 'learning_rate': 8.804688256252557e-06, 'epoch': 0.55} 55%|█████▌ | 679/1230 [13:20:47<10:41:51, 69.89s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 15:32:50,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.34 | bwd_microstep: 5211.50 | bwd_inner_microstep: 5125.15 | bwd_allreduce_microstep: 86.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3812 [2024-07-31 15:32:59,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.84 | bwd_microstep: 5164.70 | bwd_inner_microstep: 5116.74 | bwd_allreduce_microstep: 47.89 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2214 [2024-07-31 15:33:07,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.84 | bwd_microstep: 5190.50 | bwd_inner_microstep: 4789.45 | bwd_allreduce_microstep: 400.98 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2215 [2024-07-31 15:33:16,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.02 | bwd_microstep: 5131.73 | bwd_inner_microstep: 4733.73 | bwd_allreduce_microstep: 397.93 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-31 15:33:25,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.74 | bwd_microstep: 5178.30 | bwd_inner_microstep: 5101.00 | bwd_allreduce_microstep: 77.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-31 15:33:34,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.79 | bwd_microstep: 5144.98 | bwd_inner_microstep: 4745.62 | bwd_allreduce_microstep: 399.29 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3708 [2024-07-31 15:33:42,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.09 | bwd_microstep: 5025.66 | bwd_inner_microstep: 4965.09 | bwd_allreduce_microstep: 60.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 15:33:51,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 15:33:51,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.40 | bwd_microstep: 5058.58 | bwd_inner_microstep: 4990.40 | bwd_allreduce_microstep: 68.11 | step_microstep: 181.40 [2024-07-31 15:33:51,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28473.97 | bwd: 41105.92 | bwd_inner: 39567.11 | bwd_allreduce: 1538.33 | step: 182.10 55%|█████▌ | 680/1230 [13:21:57<10:40:44, 69.90s/it] {'loss': 1.1913, 'learning_rate': 8.778547679551553e-06, 'epoch': 0.55} 55%|█████▌ | 680/1230 [13:21:57<10:40:44, 69.90s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2035 [2024-07-31 15:34:00,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.21 | bwd_microstep: 5505.37 | bwd_inner_microstep: 5083.18 | bwd_allreduce_microstep: 422.13 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3874 [2024-07-31 15:34:09,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.75 | bwd_microstep: 5316.90 | bwd_inner_microstep: 5254.74 | bwd_allreduce_microstep: 62.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-31 15:34:18,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.04 | bwd_microstep: 5028.92 | bwd_inner_microstep: 5009.61 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3763 [2024-07-31 15:34:27,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.08 | bwd_microstep: 5157.43 | bwd_inner_microstep: 5105.79 | bwd_allreduce_microstep: 51.57 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2097 [2024-07-31 15:34:36,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.76 | bwd_microstep: 5275.63 | bwd_inner_microstep: 4868.35 | bwd_allreduce_microstep: 407.22 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2183 [2024-07-31 15:34:44,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3052.35 | bwd_microstep: 5001.35 | bwd_inner_microstep: 4614.51 | bwd_allreduce_microstep: 386.77 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2175 [2024-07-31 15:34:52,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.36 | bwd_microstep: 5095.97 | bwd_inner_microstep: 4697.78 | bwd_allreduce_microstep: 398.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 15:35:01,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 15:35:01,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.82 | bwd_microstep: 5057.55 | bwd_inner_microstep: 4999.88 | bwd_allreduce_microstep: 57.61 | step_microstep: 182.92 [2024-07-31 15:35:01,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28360.25 | bwd: 41439.12 | bwd_inner: 39633.77 | bwd_allreduce: 1804.85 | step: 183.51 55%|█████▌ | 681/1230 [13:23:07<10:40:11, 69.97s/it] {'loss': 1.1794, 'learning_rate': 8.752415573079043e-06, 'epoch': 0.55} 55%|█████▌ | 681/1230 [13:23:07<10:40:11, 69.97s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2342 [2024-07-31 15:35:10,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.78 | bwd_microstep: 5494.77 | bwd_inner_microstep: 5073.00 | bwd_allreduce_microstep: 421.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3845 [2024-07-31 15:35:19,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.57 | bwd_microstep: 5095.97 | bwd_inner_microstep: 5076.57 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3869 [2024-07-31 15:35:28,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.93 | bwd_microstep: 5107.19 | bwd_inner_microstep: 5087.81 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-07-31 15:35:37,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.90 | bwd_microstep: 5206.77 | bwd_inner_microstep: 5154.99 | bwd_allreduce_microstep: 51.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3573 [2024-07-31 15:35:46,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.63 | bwd_microstep: 5083.91 | bwd_inner_microstep: 5004.47 | bwd_allreduce_microstep: 79.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 15:35:54,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.12 | bwd_microstep: 4986.29 | bwd_inner_microstep: 4966.87 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3759 [2024-07-31 15:36:02,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3129.61 | bwd_microstep: 4862.26 | bwd_inner_microstep: 4836.15 | bwd_allreduce_microstep: 26.05 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2141 [2024-07-31 15:36:11,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 15:36:11,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.04 | bwd_microstep: 5146.90 | bwd_inner_microstep: 4748.35 | bwd_allreduce_microstep: 398.48 | step_microstep: 182.19 [2024-07-31 15:36:11,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28750.47 | bwd: 40984.04 | bwd_inner: 39948.14 | bwd_allreduce: 1035.40 | step: 182.77 55%|█████▌ | 682/1230 [13:24:17<10:39:18, 70.00s/it] {'loss': 1.1114, 'learning_rate': 8.726292118049555e-06, 'epoch': 0.55} 55%|█████▌ | 682/1230 [13:24:17<10:39:18, 70.00s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3166 [2024-07-31 15:36:20,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.62 | bwd_microstep: 5450.55 | bwd_inner_microstep: 5083.97 | bwd_allreduce_microstep: 366.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3568 [2024-07-31 15:36:29,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.47 | bwd_microstep: 5124.67 | bwd_inner_microstep: 5041.48 | bwd_allreduce_microstep: 83.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3856 [2024-07-31 15:36:38,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.76 | bwd_microstep: 5061.00 | bwd_inner_microstep: 5025.96 | bwd_allreduce_microstep: 34.97 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-31 15:36:47,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.13 | bwd_microstep: 5217.45 | bwd_inner_microstep: 4812.79 | bwd_allreduce_microstep: 404.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 15:36:55,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.59 | bwd_microstep: 5053.07 | bwd_inner_microstep: 4997.19 | bwd_allreduce_microstep: 55.81 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-31 15:37:04,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.13 | bwd_microstep: 5128.79 | bwd_inner_microstep: 5051.00 | bwd_allreduce_microstep: 77.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3627 [2024-07-31 15:37:12,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3072.76 | bwd_microstep: 4885.29 | bwd_inner_microstep: 4829.87 | bwd_allreduce_microstep: 55.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 15:37:21,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 15:37:21,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.78 | bwd_microstep: 5036.77 | bwd_inner_microstep: 4997.92 | bwd_allreduce_microstep: 38.78 | step_microstep: 182.48 [2024-07-31 15:37:21,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28329.15 | bwd: 40957.58 | bwd_inner: 39840.12 | bwd_allreduce: 1116.96 | step: 183.08 56%|█████▌ | 683/1230 [13:25:27<10:37:06, 69.88s/it] {'loss': 1.1667, 'learning_rate': 8.700177495617636e-06, 'epoch': 0.56} 56%|█████▌ | 683/1230 [13:25:27<10:37:06, 69.88s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4086 [2024-07-31 15:37:30,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.77 | bwd_microstep: 5379.44 | bwd_inner_microstep: 5347.84 | bwd_allreduce_microstep: 31.54 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3973 [2024-07-31 15:37:39,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3834.82 | bwd_microstep: 5242.48 | bwd_inner_microstep: 5223.09 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3802 [2024-07-31 15:37:48,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.54 | bwd_microstep: 5043.94 | bwd_inner_microstep: 5024.54 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 15:37:57,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.00 | bwd_microstep: 5115.18 | bwd_inner_microstep: 5039.16 | bwd_allreduce_microstep: 75.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 15:38:05,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.89 | bwd_microstep: 4946.19 | bwd_inner_microstep: 4915.92 | bwd_allreduce_microstep: 30.20 | step_microstep: 0.19 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2226 [2024-07-31 15:38:14,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.27 | bwd_microstep: 5131.44 | bwd_inner_microstep: 4734.62 | bwd_allreduce_microstep: 396.75 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2144 [2024-07-31 15:38:22,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3032.13 | bwd_microstep: 4915.43 | bwd_inner_microstep: 4538.70 | bwd_allreduce_microstep: 376.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 15:38:31,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.77 [2024-07-31 15:38:31,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.43 | bwd_microstep: 4878.47 | bwd_inner_microstep: 4859.04 | bwd_allreduce_microstep: 19.36 | step_microstep: 186.22 [2024-07-31 15:38:31,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28752.75 | bwd: 40652.56 | bwd_inner: 39682.85 | bwd_allreduce: 969.21 | step: 186.92 56%|█████▌ | 684/1230 [13:26:36<10:35:32, 69.84s/it] {'loss': 1.1204, 'learning_rate': 8.674071886876572e-06, 'epoch': 0.56} 56%|█████▌ | 684/1230 [13:26:36<10:35:32, 69.84s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-31 15:38:40,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.22 | bwd_microstep: 5572.71 | bwd_inner_microstep: 5388.83 | bwd_allreduce_microstep: 183.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3881 [2024-07-31 15:38:49,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.55 | bwd_microstep: 5355.36 | bwd_inner_microstep: 5291.94 | bwd_allreduce_microstep: 63.35 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3890 [2024-07-31 15:38:58,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.27 | bwd_microstep: 5120.85 | bwd_inner_microstep: 5101.46 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-31 15:39:07,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.22 | bwd_microstep: 5060.29 | bwd_inner_microstep: 5036.18 | bwd_allreduce_microstep: 24.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2115 [2024-07-31 15:39:15,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3048.51 | bwd_microstep: 5025.36 | bwd_inner_microstep: 4639.51 | bwd_allreduce_microstep: 385.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 15:39:24,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.65 | bwd_microstep: 5169.09 | bwd_inner_microstep: 5113.20 | bwd_allreduce_microstep: 55.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2116 [2024-07-31 15:39:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3458.07 | bwd_microstep: 5043.29 | bwd_inner_microstep: 4652.44 | bwd_allreduce_microstep: 390.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 15:39:41,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 15:39:41,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.19 | bwd_microstep: 5124.67 | bwd_inner_microstep: 4727.89 | bwd_allreduce_microstep: 396.71 | step_microstep: 182.09 [2024-07-31 15:39:41,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28587.59 | bwd: 41471.61 | bwd_inner: 39951.40 | bwd_allreduce: 1519.73 | step: 182.68 56%|█████▌ | 685/1230 [13:27:47<10:35:53, 70.01s/it] {'loss': 1.1403, 'learning_rate': 8.647975472857148e-06, 'epoch': 0.56} 56%|█████▌ | 685/1230 [13:27:47<10:35:53, 70.01s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3969 [2024-07-31 15:39:50,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.34 | bwd_microstep: 5360.06 | bwd_inner_microstep: 5311.33 | bwd_allreduce_microstep: 48.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-31 15:39:59,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.72 | bwd_microstep: 5180.68 | bwd_inner_microstep: 5097.88 | bwd_allreduce_microstep: 82.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 15:40:07,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3236.08 | bwd_microstep: 4879.56 | bwd_inner_microstep: 4827.88 | bwd_allreduce_microstep: 51.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 15:40:15,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3219.22 | bwd_microstep: 4823.44 | bwd_inner_microstep: 4778.88 | bwd_allreduce_microstep: 44.50 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2102 [2024-07-31 15:40:24,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.64 | bwd_microstep: 5113.69 | bwd_inner_microstep: 4716.54 | bwd_allreduce_microstep: 397.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-31 15:40:32,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.91 | bwd_microstep: 4975.23 | bwd_inner_microstep: 4955.81 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-31 15:40:40,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3008.44 | bwd_microstep: 4873.04 | bwd_inner_microstep: 4495.68 | bwd_allreduce_microstep: 377.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-31 15:40:49,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-31 15:40:49,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.70 | bwd_microstep: 5026.27 | bwd_inner_microstep: 4985.64 | bwd_allreduce_microstep: 40.57 | step_microstep: 182.83 [2024-07-31 15:40:49,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27591.95 | bwd: 40231.96 | bwd_inner: 39169.57 | bwd_allreduce: 1061.90 | step: 183.41 56%|█████▌ | 686/1230 [13:28:55<10:29:40, 69.45s/it] {'loss': 1.2086, 'learning_rate': 8.621888434526382e-06, 'epoch': 0.56} 56%|█████▌ | 686/1230 [13:28:55<10:29:40, 69.45s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2449 [2024-07-31 15:40:58,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.73 | bwd_microstep: 5496.50 | bwd_inner_microstep: 5073.87 | bwd_allreduce_microstep: 422.56 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3993 [2024-07-31 15:41:07,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.75 | bwd_microstep: 5122.62 | bwd_inner_microstep: 5090.52 | bwd_allreduce_microstep: 32.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 15:41:15,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.31 | bwd_microstep: 5181.38 | bwd_inner_microstep: 5100.74 | bwd_allreduce_microstep: 80.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3750 [2024-07-31 15:41:24,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.65 | bwd_microstep: 5173.01 | bwd_inner_microstep: 5117.95 | bwd_allreduce_microstep: 54.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-31 15:41:33,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.19 | bwd_microstep: 5054.48 | bwd_inner_microstep: 5013.70 | bwd_allreduce_microstep: 40.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 15:41:41,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.99 | bwd_microstep: 4876.60 | bwd_inner_microstep: 4857.26 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-31 15:41:50,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.22 | bwd_microstep: 5031.09 | bwd_inner_microstep: 4965.71 | bwd_allreduce_microstep: 65.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3663 [2024-07-31 15:41:59,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 15:41:59,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.51 | bwd_microstep: 4925.19 | bwd_inner_microstep: 4898.67 | bwd_allreduce_microstep: 26.46 | step_microstep: 181.52 [2024-07-31 15:41:59,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28695.28 | bwd: 40860.85 | bwd_inner: 40118.35 | bwd_allreduce: 742.00 | step: 182.11 56%|█████▌ | 687/1230 [13:30:05<10:29:42, 69.58s/it] {'loss': 1.1376, 'learning_rate': 8.595810952786289e-06, 'epoch': 0.56} 56%|█████▌ | 687/1230 [13:30:05<10:29:42, 69.58s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4016 [2024-07-31 15:42:08,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3847.92 | bwd_microstep: 5270.11 | bwd_inner_microstep: 5251.02 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3809 [2024-07-31 15:42:17,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.19 | bwd_microstep: 5104.76 | bwd_inner_microstep: 5075.83 | bwd_allreduce_microstep: 28.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3983 [2024-07-31 15:42:26,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3827.94 | bwd_microstep: 5254.28 | bwd_inner_microstep: 5235.02 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-31 15:42:35,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.80 | bwd_microstep: 5030.92 | bwd_inner_microstep: 5011.57 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3757 [2024-07-31 15:42:44,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.14 | bwd_microstep: 5147.34 | bwd_inner_microstep: 5071.27 | bwd_allreduce_microstep: 76.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 15:42:52,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.93 | bwd_microstep: 4990.23 | bwd_inner_microstep: 4970.92 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2105 [2024-07-31 15:43:01,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.40 | bwd_microstep: 5182.50 | bwd_inner_microstep: 4779.63 | bwd_allreduce_microstep: 402.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 15:43:10,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.79 [2024-07-31 15:43:10,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.07 | bwd_microstep: 4927.72 | bwd_inner_microstep: 4900.71 | bwd_allreduce_microstep: 26.94 | step_microstep: 181.92 [2024-07-31 15:43:10,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29759.29 | bwd: 40907.85 | bwd_inner: 40295.91 | bwd_allreduce: 611.45 | step: 182.52 56%|█████▌ | 688/1230 [13:31:16<10:32:24, 70.01s/it] {'loss': 1.1492, 'learning_rate': 8.569743208472596e-06, 'epoch': 0.56} 56%|█████▌ | 688/1230 [13:31:16<10:32:24, 70.01s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3861 [2024-07-31 15:43:19,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.21 | bwd_microstep: 5511.68 | bwd_inner_microstep: 5421.07 | bwd_allreduce_microstep: 90.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3989 [2024-07-31 15:43:28,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3813.13 | bwd_microstep: 5241.01 | bwd_inner_microstep: 5221.70 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 15:43:37,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.46 | bwd_microstep: 5226.13 | bwd_inner_microstep: 5143.86 | bwd_allreduce_microstep: 82.20 | step_microstep: 0.19 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3788 [2024-07-31 15:43:46,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.97 | bwd_microstep: 4927.52 | bwd_inner_microstep: 4908.16 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3852 [2024-07-31 15:43:54,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.62 | bwd_microstep: 5171.47 | bwd_inner_microstep: 5129.49 | bwd_allreduce_microstep: 41.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 15:44:03,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.45 | bwd_microstep: 5211.88 | bwd_inner_microstep: 5136.31 | bwd_allreduce_microstep: 75.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 15:44:12,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.78 | bwd_microstep: 5072.58 | bwd_inner_microstep: 5009.06 | bwd_allreduce_microstep: 63.45 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-31 15:44:21,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 15:44:21,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.52 | bwd_microstep: 5035.71 | bwd_inner_microstep: 4980.11 | bwd_allreduce_microstep: 55.53 | step_microstep: 181.35 [2024-07-31 15:44:21,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29043.06 | bwd: 41397.97 | bwd_inner: 40949.71 | bwd_allreduce: 447.78 | step: 182.05 56%|█████▌ | 689/1230 [13:32:27<10:33:19, 70.24s/it] {'loss': 1.1376, 'learning_rate': 8.54368538235352e-06, 'epoch': 0.56} 56%|█████▌ | 689/1230 [13:32:27<10:33:19, 70.24s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 15:44:30,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3853.96 | bwd_microstep: 5360.43 | bwd_inner_microstep: 5341.37 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3849 [2024-07-31 15:44:38,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.30 | bwd_microstep: 5112.59 | bwd_inner_microstep: 5069.05 | bwd_allreduce_microstep: 43.47 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 15:44:47,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.89 | bwd_microstep: 5253.54 | bwd_inner_microstep: 4845.29 | bwd_allreduce_microstep: 408.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 15:44:56,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.66 | bwd_microstep: 5165.80 | bwd_inner_microstep: 5083.86 | bwd_allreduce_microstep: 81.87 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-31 15:45:05,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.89 | bwd_microstep: 5200.13 | bwd_inner_microstep: 4798.08 | bwd_allreduce_microstep: 401.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 15:45:14,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.43 | bwd_microstep: 5169.57 | bwd_inner_microstep: 5092.92 | bwd_allreduce_microstep: 76.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-31 15:45:22,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.85 | bwd_microstep: 5031.63 | bwd_inner_microstep: 5006.35 | bwd_allreduce_microstep: 25.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 15:45:31,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 15:45:31,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.21 | bwd_microstep: 4923.62 | bwd_inner_microstep: 4899.22 | bwd_allreduce_microstep: 24.33 | step_microstep: 182.09 [2024-07-31 15:45:31,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28902.11 | bwd: 41217.29 | bwd_inner: 40136.08 | bwd_allreduce: 1080.72 | step: 182.68 56%|█████▌ | 690/1230 [13:33:37<10:32:43, 70.30s/it] {'loss': 1.144, 'learning_rate': 8.517637655128488e-06, 'epoch': 0.56} 56%|█████▌ | 690/1230 [13:33:37<10:32:43, 70.30s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 15:45:40,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3861.45 | bwd_microstep: 5331.53 | bwd_inner_microstep: 5312.45 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3580 [2024-07-31 15:45:49,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.71 | bwd_microstep: 5240.32 | bwd_inner_microstep: 5105.80 | bwd_allreduce_microstep: 134.45 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3757 [2024-07-31 15:45:58,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.59 | bwd_microstep: 5260.23 | bwd_inner_microstep: 5181.07 | bwd_allreduce_microstep: 79.10 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3689 [2024-07-31 15:46:07,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.73 | bwd_microstep: 5151.78 | bwd_inner_microstep: 5092.36 | bwd_allreduce_microstep: 59.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-31 15:46:16,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.07 | bwd_microstep: 4991.61 | bwd_inner_microstep: 4937.27 | bwd_allreduce_microstep: 54.26 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2113 [2024-07-31 15:46:24,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3203.13 | bwd_microstep: 4911.29 | bwd_inner_microstep: 4533.05 | bwd_allreduce_microstep: 378.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 15:46:32,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.48 | bwd_microstep: 4999.14 | bwd_inner_microstep: 4944.87 | bwd_allreduce_microstep: 54.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-31 15:46:41,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-31 15:46:41,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.02 | bwd_microstep: 5081.08 | bwd_inner_microstep: 4685.89 | bwd_allreduce_microstep: 395.12 | step_microstep: 181.37 [2024-07-31 15:46:41,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28543.09 | bwd: 40966.94 | bwd_inner: 39792.69 | bwd_allreduce: 1173.76 | step: 181.96 56%|█████▌ | 691/1230 [13:34:47<10:30:17, 70.16s/it] {'loss': 1.1479, 'learning_rate': 8.491600207426907e-06, 'epoch': 0.56} 56%|█████▌ | 691/1230 [13:34:47<10:30:17, 70.16s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3844 [2024-07-31 15:46:50,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.83 | bwd_microstep: 5238.38 | bwd_inner_microstep: 5185.89 | bwd_allreduce_microstep: 52.42 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2304 [2024-07-31 15:46:59,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.63 | bwd_microstep: 5198.39 | bwd_inner_microstep: 4794.69 | bwd_allreduce_microstep: 403.63 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2227 [2024-07-31 15:47:07,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3067.86 | bwd_microstep: 5021.49 | bwd_inner_microstep: 4633.10 | bwd_allreduce_microstep: 388.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 15:47:15,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.75 | bwd_microstep: 5060.82 | bwd_inner_microstep: 4995.39 | bwd_allreduce_microstep: 65.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-31 15:47:24,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.80 | bwd_microstep: 5188.18 | bwd_inner_microstep: 4783.73 | bwd_allreduce_microstep: 404.38 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1147 [2024-07-31 15:47:32,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2970.29 | bwd_microstep: 4952.38 | bwd_inner_microstep: 4573.94 | bwd_allreduce_microstep: 378.35 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 15:47:41,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.14 | bwd_microstep: 5036.84 | bwd_inner_microstep: 4970.31 | bwd_allreduce_microstep: 66.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-31 15:47:50,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.80 [2024-07-31 15:47:50,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.06 | bwd_microstep: 4897.73 | bwd_inner_microstep: 4874.88 | bwd_allreduce_microstep: 22.77 | step_microstep: 182.48 [2024-07-31 15:47:50,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27628.27 | bwd: 40594.18 | bwd_inner: 38811.88 | bwd_allreduce: 1781.81 | step: 183.09 56%|█████▋ | 692/1230 [13:35:55<10:24:47, 69.68s/it] {'loss': 1.1704, 'learning_rate': 8.465573219806893e-06, 'epoch': 0.56} 56%|█████▋ | 692/1230 [13:35:55<10:24:47, 69.68s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3817 [2024-07-31 15:47:59,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.36 | bwd_microstep: 5617.61 | bwd_inner_microstep: 5514.15 | bwd_allreduce_microstep: 103.39 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2033 [2024-07-31 15:48:07,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2952.99 | bwd_microstep: 4765.31 | bwd_inner_microstep: 4395.98 | bwd_allreduce_microstep: 369.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2261 [2024-07-31 15:48:15,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3021.39 | bwd_microstep: 4976.15 | bwd_inner_microstep: 4593.28 | bwd_allreduce_microstep: 382.80 | step_microstep: 0.19 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3616 [2024-07-31 15:48:24,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.22 | bwd_microstep: 5233.53 | bwd_inner_microstep: 5121.68 | bwd_allreduce_microstep: 111.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 15:48:32,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.91 | bwd_microstep: 5073.48 | bwd_inner_microstep: 4681.76 | bwd_allreduce_microstep: 391.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 15:48:40,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3206.56 | bwd_microstep: 4736.40 | bwd_inner_microstep: 4708.84 | bwd_allreduce_microstep: 27.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-31 15:48:49,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.67 | bwd_microstep: 4903.19 | bwd_inner_microstep: 4876.81 | bwd_allreduce_microstep: 26.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 15:48:58,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 15:48:58,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.25 | bwd_microstep: 5052.54 | bwd_inner_microstep: 4984.28 | bwd_allreduce_microstep: 68.20 | step_microstep: 181.27 [2024-07-31 15:48:58,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27322.25 | bwd: 40358.21 | bwd_inner: 38876.73 | bwd_allreduce: 1481.00 | step: 181.96 56%|█████▋ | 693/1230 [13:37:03<10:19:08, 69.18s/it] {'loss': 1.1555, 'learning_rate': 8.439556872754025e-06, 'epoch': 0.56} 56%|█████▋ | 693/1230 [13:37:03<10:19:08, 69.18s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3550 [2024-07-31 15:49:07,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.85 | bwd_microstep: 5551.36 | bwd_inner_microstep: 5352.43 | bwd_allreduce_microstep: 198.87 | step_microstep: 0.09 dynamic ViT batch size: 5, images per sample: 2.5, dynamic token length: 1293 [2024-07-31 15:49:16,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.95 | bwd_microstep: 5312.67 | bwd_inner_microstep: 4902.44 | bwd_allreduce_microstep: 410.16 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 15:49:24,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.70 | bwd_microstep: 4837.91 | bwd_inner_microstep: 4794.46 | bwd_allreduce_microstep: 43.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-07-31 15:49:33,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.91 | bwd_microstep: 5026.30 | bwd_inner_microstep: 5002.14 | bwd_allreduce_microstep: 24.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-07-31 15:49:41,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.33 | bwd_microstep: 5038.51 | bwd_inner_microstep: 5012.82 | bwd_allreduce_microstep: 25.62 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2114 [2024-07-31 15:49:50,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.44 | bwd_microstep: 5097.12 | bwd_inner_microstep: 4701.49 | bwd_allreduce_microstep: 395.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 15:49:59,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.12 | bwd_microstep: 5326.35 | bwd_inner_microstep: 5205.57 | bwd_allreduce_microstep: 120.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 15:50:08,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 15:50:08,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.48 | bwd_microstep: 5111.38 | bwd_inner_microstep: 5043.40 | bwd_allreduce_microstep: 67.91 | step_microstep: 182.20 [2024-07-31 15:50:08,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28703.70 | bwd: 41301.59 | bwd_inner: 40014.70 | bwd_allreduce: 1286.41 | step: 182.79 56%|█████▋ | 694/1230 [13:38:14<10:21:07, 69.53s/it] {'loss': 1.1537, 'learning_rate': 8.413551346680093e-06, 'epoch': 0.56} 56%|█████▋ | 694/1230 [13:38:14<10:21:07, 69.53s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3852 [2024-07-31 15:50:17,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.96 | bwd_microstep: 5607.11 | bwd_inner_microstep: 5505.98 | bwd_allreduce_microstep: 101.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3901 [2024-07-31 15:50:26,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3289.15 | bwd_microstep: 4981.56 | bwd_inner_microstep: 4958.23 | bwd_allreduce_microstep: 23.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3863 [2024-07-31 15:50:35,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.43 | bwd_microstep: 5179.37 | bwd_inner_microstep: 5153.52 | bwd_allreduce_microstep: 25.77 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3691 [2024-07-31 15:50:43,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.40 | bwd_microstep: 5198.59 | bwd_inner_microstep: 5106.39 | bwd_allreduce_microstep: 92.13 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2066 [2024-07-31 15:50:52,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.31 | bwd_microstep: 5233.12 | bwd_inner_microstep: 4827.50 | bwd_allreduce_microstep: 405.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-31 15:51:01,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.39 | bwd_microstep: 4968.09 | bwd_inner_microstep: 4933.98 | bwd_allreduce_microstep: 34.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 15:51:09,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.91 | bwd_microstep: 4988.73 | bwd_inner_microstep: 4934.99 | bwd_allreduce_microstep: 53.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 15:51:18,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.74 [2024-07-31 15:51:18,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.15 | bwd_microstep: 5070.77 | bwd_inner_microstep: 5005.68 | bwd_allreduce_microstep: 65.01 | step_microstep: 181.95 [2024-07-31 15:51:18,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28709.61 | bwd: 41227.31 | bwd_inner: 40426.21 | bwd_allreduce: 800.62 | step: 182.53 57%|█████▋ | 695/1230 [13:39:24<10:21:56, 69.75s/it] {'loss': 1.1519, 'learning_rate': 8.387556821921863e-06, 'epoch': 0.56} 57%|█████▋ | 695/1230 [13:39:24<10:21:56, 69.75s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3866 [2024-07-31 15:51:27,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3792.28 | bwd_microstep: 5194.52 | bwd_inner_microstep: 5164.51 | bwd_allreduce_microstep: 29.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3982 [2024-07-31 15:51:36,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.64 | bwd_microstep: 5097.66 | bwd_inner_microstep: 5074.17 | bwd_allreduce_microstep: 23.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 15:51:45,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.30 | bwd_microstep: 5231.14 | bwd_inner_microstep: 5145.95 | bwd_allreduce_microstep: 85.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3754 [2024-07-31 15:51:54,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.22 | bwd_microstep: 5200.71 | bwd_inner_microstep: 5119.88 | bwd_allreduce_microstep: 80.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 15:52:03,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.26 | bwd_microstep: 5167.99 | bwd_inner_microstep: 5088.65 | bwd_allreduce_microstep: 79.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 15:52:11,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3227.07 | bwd_microstep: 4867.97 | bwd_inner_microstep: 4819.19 | bwd_allreduce_microstep: 48.71 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 15:52:19,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.82 | bwd_microstep: 5055.61 | bwd_inner_microstep: 4990.34 | bwd_allreduce_microstep: 65.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 15:52:28,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 15:52:28,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.94 | bwd_microstep: 5158.36 | bwd_inner_microstep: 4757.20 | bwd_allreduce_microstep: 401.09 | step_microstep: 182.13 [2024-07-31 15:52:28,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28641.46 | bwd: 40973.94 | bwd_inner: 40159.81 | bwd_allreduce: 813.63 | step: 182.73 57%|█████▋ | 696/1230 [13:40:34<10:21:17, 69.81s/it] {'loss': 1.1028, 'learning_rate': 8.361573478739792e-06, 'epoch': 0.57} 57%|█████▋ | 696/1230 [13:40:34<10:21:17, 69.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3628 [2024-07-31 15:52:37,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.49 | bwd_microstep: 5510.56 | bwd_inner_microstep: 5313.44 | bwd_allreduce_microstep: 197.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 15:52:46,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.81 | bwd_microstep: 5283.89 | bwd_inner_microstep: 5192.51 | bwd_allreduce_microstep: 91.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 15:52:55,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.02 | bwd_microstep: 5131.94 | bwd_inner_microstep: 5062.94 | bwd_allreduce_microstep: 68.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-31 15:53:04,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.12 | bwd_microstep: 5248.55 | bwd_inner_microstep: 4840.79 | bwd_allreduce_microstep: 407.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 15:53:13,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.07 | bwd_microstep: 5209.00 | bwd_inner_microstep: 5124.57 | bwd_allreduce_microstep: 84.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 15:53:21,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.48 | bwd_microstep: 5160.53 | bwd_inner_microstep: 4758.78 | bwd_allreduce_microstep: 401.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3678 [2024-07-31 15:53:30,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.81 | bwd_microstep: 5065.94 | bwd_inner_microstep: 4991.24 | bwd_allreduce_microstep: 74.63 | step_microstep: 0.11 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 15:53:39,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 15:53:39,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.00 | bwd_microstep: 4944.55 | bwd_inner_microstep: 4918.54 | bwd_allreduce_microstep: 25.94 | step_microstep: 182.02 [2024-07-31 15:53:39,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28971.71 | bwd: 41554.93 | bwd_inner: 40202.75 | bwd_allreduce: 1351.68 | step: 182.64 57%|█████▋ | 697/1230 [13:41:45<10:22:55, 70.12s/it] {'loss': 1.1715, 'learning_rate': 8.335601497316812e-06, 'epoch': 0.57} 57%|█████▋ | 697/1230 [13:41:45<10:22:55, 70.12s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 15:53:48,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3857.43 | bwd_microstep: 5327.64 | bwd_inner_microstep: 5308.56 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3790 [2024-07-31 15:53:57,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.99 | bwd_microstep: 5232.73 | bwd_inner_microstep: 5173.33 | bwd_allreduce_microstep: 59.33 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-07-31 15:54:05,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3054.10 | bwd_microstep: 5057.69 | bwd_inner_microstep: 4668.09 | bwd_allreduce_microstep: 389.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 15:54:14,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.73 | bwd_microstep: 4982.32 | bwd_inner_microstep: 4962.93 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 15:54:23,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.70 | bwd_microstep: 5025.30 | bwd_inner_microstep: 4983.24 | bwd_allreduce_microstep: 41.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 15:54:31,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.33 | bwd_microstep: 5069.29 | bwd_inner_microstep: 5003.09 | bwd_allreduce_microstep: 66.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 15:54:40,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.85 | bwd_microstep: 4893.36 | bwd_inner_microstep: 4874.03 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 15:54:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 15:54:48,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3021.27 | bwd_microstep: 4928.15 | bwd_inner_microstep: 4549.65 | bwd_allreduce_microstep: 378.42 | step_microstep: 181.28 [2024-07-31 15:54:48,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28279.30 | bwd: 40516.47 | bwd_inner: 39522.85 | bwd_allreduce: 993.11 | step: 181.97 57%|█████▋ | 698/1230 [13:42:54<10:19:07, 69.83s/it] {'loss': 1.173, 'learning_rate': 8.309641057757052e-06, 'epoch': 0.57} 57%|█████▋ | 698/1230 [13:42:54<10:19:07, 69.83s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4057 [2024-07-31 15:54:57,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3894.22 | bwd_microstep: 5323.39 | bwd_inner_microstep: 5304.31 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3856 [2024-07-31 15:55:05,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3182.55 | bwd_microstep: 4867.90 | bwd_inner_microstep: 4847.62 | bwd_allreduce_microstep: 20.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 15:55:14,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.49 | bwd_microstep: 5226.15 | bwd_inner_microstep: 5163.63 | bwd_allreduce_microstep: 62.45 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2078 [2024-07-31 15:55:22,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3004.05 | bwd_microstep: 4920.51 | bwd_inner_microstep: 4544.44 | bwd_allreduce_microstep: 375.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 15:55:31,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.93 | bwd_microstep: 5096.34 | bwd_inner_microstep: 5028.23 | bwd_allreduce_microstep: 68.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 15:55:39,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3329.76 | bwd_microstep: 5023.51 | bwd_inner_microstep: 4631.88 | bwd_allreduce_microstep: 391.57 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 15:55:48,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.07 | bwd_microstep: 5066.66 | bwd_inner_microstep: 4672.96 | bwd_allreduce_microstep: 393.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 15:55:57,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 15:55:57,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.16 | bwd_microstep: 5086.36 | bwd_inner_microstep: 4691.72 | bwd_allreduce_microstep: 394.57 | step_microstep: 183.96 [2024-07-31 15:55:57,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27587.14 | bwd: 40610.82 | bwd_inner: 38884.74 | bwd_allreduce: 1725.58 | step: 184.56 57%|█████▋ | 699/1230 [13:44:03<10:14:30, 69.44s/it] {'loss': 1.2277, 'learning_rate': 8.283692340084623e-06, 'epoch': 0.57} 57%|█████▋ | 699/1230 [13:44:03<10:14:30, 69.44s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3959 [2024-07-31 15:56:06,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3859.43 | bwd_microstep: 5465.82 | bwd_inner_microstep: 5398.39 | bwd_allreduce_microstep: 67.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3815 [2024-07-31 15:56:15,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.95 | bwd_microstep: 5112.44 | bwd_inner_microstep: 5084.35 | bwd_allreduce_microstep: 28.03 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2192 [2024-07-31 15:56:24,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.83 | bwd_microstep: 5207.37 | bwd_inner_microstep: 4801.84 | bwd_allreduce_microstep: 405.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-31 15:56:32,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3069.80 | bwd_microstep: 5067.76 | bwd_inner_microstep: 4676.38 | bwd_allreduce_microstep: 391.31 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3643 [2024-07-31 15:56:41,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.81 | bwd_microstep: 5181.02 | bwd_inner_microstep: 5082.27 | bwd_allreduce_microstep: 98.68 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3671 [2024-07-31 15:56:49,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.69 | bwd_microstep: 5005.74 | bwd_inner_microstep: 4945.88 | bwd_allreduce_microstep: 59.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-31 15:56:58,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.28 | bwd_microstep: 4982.64 | bwd_inner_microstep: 4963.31 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 15:57:07,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 15:57:07,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.20 | bwd_microstep: 5088.81 | bwd_inner_microstep: 5027.69 | bwd_allreduce_microstep: 61.05 | step_microstep: 182.10 [2024-07-31 15:57:07,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28680.90 | bwd: 41111.57 | bwd_inner: 39980.06 | bwd_allreduce: 1131.02 | step: 182.67 57%|█████▋ | 700/1230 [13:45:13<10:15:10, 69.64s/it] {'loss': 1.1248, 'learning_rate': 8.257755524242333e-06, 'epoch': 0.57} 57%|█████▋ | 700/1230 [13:45:13<10:15:10, 69.64s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2291 [2024-07-31 15:57:16,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.95 | bwd_microstep: 5330.88 | bwd_inner_microstep: 4923.91 | bwd_allreduce_microstep: 406.90 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3878 [2024-07-31 15:57:24,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3409.13 | bwd_microstep: 5237.45 | bwd_inner_microstep: 5177.21 | bwd_allreduce_microstep: 60.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2266 [2024-07-31 15:57:33,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.95 | bwd_microstep: 5158.43 | bwd_inner_microstep: 4756.74 | bwd_allreduce_microstep: 401.63 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2063 [2024-07-31 15:57:42,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.35 | bwd_microstep: 5193.43 | bwd_inner_microstep: 4790.46 | bwd_allreduce_microstep: 402.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 15:57:50,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3025.58 | bwd_microstep: 4950.26 | bwd_inner_microstep: 4567.15 | bwd_allreduce_microstep: 383.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 15:57:59,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.89 | bwd_microstep: 4971.35 | bwd_inner_microstep: 4951.84 | bwd_allreduce_microstep: 19.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 15:58:07,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.84 | bwd_microstep: 5009.14 | bwd_inner_microstep: 4952.71 | bwd_allreduce_microstep: 56.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 15:58:16,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-31 15:58:16,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.23 | bwd_microstep: 5068.18 | bwd_inner_microstep: 5002.53 | bwd_allreduce_microstep: 65.58 | step_microstep: 181.90 [2024-07-31 15:58:16,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27926.84 | bwd: 40919.10 | bwd_inner: 39122.49 | bwd_allreduce: 1796.13 | step: 182.50 57%|█████▋ | 701/1230 [13:46:22<10:12:46, 69.50s/it] {'loss': 1.1027, 'learning_rate': 8.231830790090461e-06, 'epoch': 0.57} 57%|█████▋ | 701/1230 [13:46:22<10:12:46, 69.50s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3869 [2024-07-31 15:58:25,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3811.72 | bwd_microstep: 5169.56 | bwd_inner_microstep: 5150.45 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3764 [2024-07-31 15:58:34,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.87 | bwd_microstep: 5063.22 | bwd_inner_microstep: 5025.71 | bwd_allreduce_microstep: 37.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 15:58:42,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.49 | bwd_microstep: 5107.54 | bwd_inner_microstep: 5037.64 | bwd_allreduce_microstep: 69.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-31 15:58:51,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3457.93 | bwd_microstep: 5032.77 | bwd_inner_microstep: 4642.22 | bwd_allreduce_microstep: 390.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-31 15:59:00,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.81 | bwd_microstep: 5000.56 | bwd_inner_microstep: 4981.25 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-07-31 15:59:08,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.12 | bwd_microstep: 4947.23 | bwd_inner_microstep: 4918.89 | bwd_allreduce_microstep: 28.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-31 15:59:17,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.00 | bwd_microstep: 5075.64 | bwd_inner_microstep: 4682.03 | bwd_allreduce_microstep: 393.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 15:59:26,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 15:59:26,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.24 | bwd_microstep: 4971.74 | bwd_inner_microstep: 4937.79 | bwd_allreduce_microstep: 33.87 | step_microstep: 182.74 [2024-07-31 15:59:26,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28934.06 | bwd: 40368.23 | bwd_inner: 39375.92 | bwd_allreduce: 991.82 | step: 183.34 57%|█████▋ | 702/1230 [13:47:31<10:11:59, 69.54s/it] {'loss': 1.1983, 'learning_rate': 8.20591831740551e-06, 'epoch': 0.57} 57%|█████▋ | 702/1230 [13:47:31<10:11:59, 69.54s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 15:59:35,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3884.22 | bwd_microstep: 5348.24 | bwd_inner_microstep: 5329.21 | bwd_allreduce_microstep: 18.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3861 [2024-07-31 15:59:44,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3419.50 | bwd_microstep: 5266.06 | bwd_inner_microstep: 5212.68 | bwd_allreduce_microstep: 53.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3789 [2024-07-31 15:59:53,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.28 | bwd_microstep: 5309.59 | bwd_inner_microstep: 5216.43 | bwd_allreduce_microstep: 93.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3759 [2024-07-31 16:00:01,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.06 | bwd_microstep: 5135.30 | bwd_inner_microstep: 5063.42 | bwd_allreduce_microstep: 71.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2108 [2024-07-31 16:00:10,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.71 | bwd_microstep: 5190.45 | bwd_inner_microstep: 4788.50 | bwd_allreduce_microstep: 401.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 16:00:19,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.37 | bwd_microstep: 5018.06 | bwd_inner_microstep: 4970.10 | bwd_allreduce_microstep: 47.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3693 [2024-07-31 16:00:27,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.51 | bwd_microstep: 4977.36 | bwd_inner_microstep: 4931.73 | bwd_allreduce_microstep: 45.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 16:00:36,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 16:00:36,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.30 | bwd_microstep: 5058.79 | bwd_inner_microstep: 5000.26 | bwd_allreduce_microstep: 58.47 | step_microstep: 182.82 [2024-07-31 16:00:36,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28742.84 | bwd: 41303.84 | bwd_inner: 40512.28 | bwd_allreduce: 791.06 | step: 183.41 57%|█████▋ | 703/1230 [13:48:42<10:13:01, 69.79s/it] {'loss': 1.1773, 'learning_rate': 8.180018285878951e-06, 'epoch': 0.57} 57%|█████▋ | 703/1230 [13:48:42<10:13:01, 69.79s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4037 [2024-07-31 16:00:45,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3845.19 | bwd_microstep: 5348.39 | bwd_inner_microstep: 5329.22 | bwd_allreduce_microstep: 19.09 | step_microstep: 0.20 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3834 [2024-07-31 16:00:54,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3822.15 | bwd_microstep: 5149.58 | bwd_inner_microstep: 5118.30 | bwd_allreduce_microstep: 31.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 16:01:03,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.82 | bwd_microstep: 5150.45 | bwd_inner_microstep: 5071.73 | bwd_allreduce_microstep: 78.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-31 16:01:11,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.90 | bwd_microstep: 4758.83 | bwd_inner_microstep: 4717.27 | bwd_allreduce_microstep: 41.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3765 [2024-07-31 16:01:19,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.58 | bwd_microstep: 4915.73 | bwd_inner_microstep: 4884.80 | bwd_allreduce_microstep: 30.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 16:01:28,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.02 | bwd_microstep: 5151.95 | bwd_inner_microstep: 5069.90 | bwd_allreduce_microstep: 81.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 16:01:37,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.11 | bwd_microstep: 5072.39 | bwd_inner_microstep: 5013.08 | bwd_allreduce_microstep: 59.24 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 16:01:45,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 16:01:45,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.61 | bwd_microstep: 5099.89 | bwd_inner_microstep: 4703.77 | bwd_allreduce_microstep: 396.04 | step_microstep: 181.20 [2024-07-31 16:01:45,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28396.28 | bwd: 40647.20 | bwd_inner: 39908.01 | bwd_allreduce: 738.70 | step: 181.90 57%|█████▋ | 704/1230 [13:49:51<10:10:45, 69.67s/it] {'loss': 1.165, 'learning_rate': 8.15413087511598e-06, 'epoch': 0.57} 57%|█████▋ | 704/1230 [13:49:51<10:10:45, 69.67s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4055 [2024-07-31 16:01:55,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.25 | bwd_microstep: 5422.53 | bwd_inner_microstep: 5382.41 | bwd_allreduce_microstep: 40.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-31 16:02:03,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.88 | bwd_microstep: 5279.24 | bwd_inner_microstep: 5181.30 | bwd_allreduce_microstep: 97.87 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3800 [2024-07-31 16:02:12,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.16 | bwd_microstep: 5278.37 | bwd_inner_microstep: 5198.14 | bwd_allreduce_microstep: 80.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 16:02:21,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.64 | bwd_microstep: 4988.53 | bwd_inner_microstep: 4969.11 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-31 16:02:30,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.32 | bwd_microstep: 5048.65 | bwd_inner_microstep: 5022.88 | bwd_allreduce_microstep: 25.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 16:02:38,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.98 | bwd_microstep: 4942.66 | bwd_inner_microstep: 4895.42 | bwd_allreduce_microstep: 47.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-31 16:02:47,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.76 | bwd_microstep: 5154.77 | bwd_inner_microstep: 5084.51 | bwd_allreduce_microstep: 70.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 16:02:56,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 16:02:56,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.31 | bwd_microstep: 4917.00 | bwd_inner_microstep: 4895.00 | bwd_allreduce_microstep: 21.93 | step_microstep: 181.97 [2024-07-31 16:02:56,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29370.21 | bwd: 41031.74 | bwd_inner: 40628.72 | bwd_allreduce: 402.53 | step: 182.56 57%|█████▋ | 705/1230 [13:51:02<10:12:24, 69.99s/it] {'loss': 1.1435, 'learning_rate': 8.12825626463427e-06, 'epoch': 0.57} 57%|█████▋ | 705/1230 [13:51:02<10:12:24, 69.99s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3927 [2024-07-31 16:03:05,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.58 | bwd_microstep: 5466.13 | bwd_inner_microstep: 5395.86 | bwd_allreduce_microstep: 70.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-31 16:03:14,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.98 | bwd_microstep: 5145.07 | bwd_inner_microstep: 5070.83 | bwd_allreduce_microstep: 74.17 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3762 [2024-07-31 16:03:23,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.35 | bwd_microstep: 5009.58 | bwd_inner_microstep: 4990.27 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 16:03:32,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.67 | bwd_microstep: 5239.27 | bwd_inner_microstep: 5150.30 | bwd_allreduce_microstep: 88.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 16:03:40,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.09 | bwd_microstep: 5005.75 | bwd_inner_microstep: 4970.95 | bwd_allreduce_microstep: 34.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-31 16:03:49,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.19 | bwd_microstep: 4836.96 | bwd_inner_microstep: 4812.78 | bwd_allreduce_microstep: 24.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-31 16:03:57,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.04 | bwd_microstep: 4917.45 | bwd_inner_microstep: 4892.75 | bwd_allreduce_microstep: 24.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 16:04:06,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 16:04:06,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.96 | bwd_microstep: 5128.68 | bwd_inner_microstep: 5060.35 | bwd_allreduce_microstep: 68.26 | step_microstep: 181.21 [2024-07-31 16:04:06,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28978.76 | bwd: 40748.87 | bwd_inner: 40344.02 | bwd_allreduce: 404.37 | step: 181.79 57%|█████▋ | 706/1230 [13:52:12<10:11:25, 70.01s/it] {'loss': 1.1666, 'learning_rate': 8.102394633862743e-06, 'epoch': 0.57} 57%|█████▋ | 706/1230 [13:52:12<10:11:25, 70.01s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3805 [2024-07-31 16:04:15,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.16 | bwd_microstep: 5310.85 | bwd_inner_microstep: 5231.17 | bwd_allreduce_microstep: 79.61 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3809 [2024-07-31 16:04:23,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3141.80 | bwd_microstep: 5122.39 | bwd_inner_microstep: 5077.46 | bwd_allreduce_microstep: 44.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-07-31 16:04:32,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.60 | bwd_microstep: 5180.15 | bwd_inner_microstep: 5098.51 | bwd_allreduce_microstep: 81.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3770 [2024-07-31 16:04:41,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.63 | bwd_microstep: 5033.03 | bwd_inner_microstep: 5009.67 | bwd_allreduce_microstep: 23.29 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 16:04:50,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.01 | bwd_microstep: 4997.54 | bwd_inner_microstep: 4978.19 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 16:04:58,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3219.94 | bwd_microstep: 4735.22 | bwd_inner_microstep: 4706.80 | bwd_allreduce_microstep: 28.35 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2132 [2024-07-31 16:05:06,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.54 | bwd_microstep: 5086.11 | bwd_inner_microstep: 4693.04 | bwd_allreduce_microstep: 393.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 16:05:15,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 16:05:15,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.65 | bwd_microstep: 4961.06 | bwd_inner_microstep: 4929.29 | bwd_allreduce_microstep: 31.71 | step_microstep: 181.41 [2024-07-31 16:05:15,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28391.25 | bwd: 40426.32 | bwd_inner: 39724.06 | bwd_allreduce: 701.76 | step: 182.01 57%|█████▋ | 707/1230 [13:53:21<10:08:00, 69.75s/it] {'loss': 1.1618, 'learning_rate': 8.0765461621403e-06, 'epoch': 0.57} 57%|█████▋ | 707/1230 [13:53:21<10:08:00, 69.75s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2394 [2024-07-31 16:05:24,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.90 | bwd_microstep: 5390.95 | bwd_inner_microstep: 4975.58 | bwd_allreduce_microstep: 415.30 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3820 [2024-07-31 16:05:33,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.94 | bwd_microstep: 5180.44 | bwd_inner_microstep: 5143.21 | bwd_allreduce_microstep: 37.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 16:05:42,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.99 | bwd_microstep: 5240.39 | bwd_inner_microstep: 5154.79 | bwd_allreduce_microstep: 85.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 16:05:51,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.25 | bwd_microstep: 5066.04 | bwd_inner_microstep: 5035.46 | bwd_allreduce_microstep: 30.52 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 16:06:00,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.48 | bwd_microstep: 5211.44 | bwd_inner_microstep: 5131.33 | bwd_allreduce_microstep: 80.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2129 [2024-07-31 16:06:09,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.42 | bwd_microstep: 5212.66 | bwd_inner_microstep: 4806.10 | bwd_allreduce_microstep: 406.48 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 16:06:17,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.37 | bwd_microstep: 5104.75 | bwd_inner_microstep: 5063.75 | bwd_allreduce_microstep: 40.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-31 16:06:26,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 16:06:26,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.38 | bwd_microstep: 5224.03 | bwd_inner_microstep: 4818.57 | bwd_allreduce_microstep: 405.39 | step_microstep: 182.53 [2024-07-31 16:06:26,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28965.64 | bwd: 41630.69 | bwd_inner: 40128.73 | bwd_allreduce: 1501.46 | step: 183.23 58%|█████▊ | 708/1230 [13:54:32<10:09:54, 70.11s/it] {'loss': 1.172, 'learning_rate': 8.050711028714592e-06, 'epoch': 0.58} 58%|█████▊ | 708/1230 [13:54:32<10:09:54, 70.11s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4077 [2024-07-31 16:06:35,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3357.71 | bwd_microstep: 5167.76 | bwd_inner_microstep: 5148.66 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 16:06:43,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3386.64 | bwd_microstep: 5106.00 | bwd_inner_microstep: 5067.43 | bwd_allreduce_microstep: 38.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3891 [2024-07-31 16:06:52,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.98 | bwd_microstep: 5251.51 | bwd_inner_microstep: 5199.81 | bwd_allreduce_microstep: 51.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2241 [2024-07-31 16:07:01,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.68 | bwd_microstep: 5236.14 | bwd_inner_microstep: 4829.51 | bwd_allreduce_microstep: 406.57 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-31 16:07:10,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.94 | bwd_microstep: 5007.40 | bwd_inner_microstep: 4946.56 | bwd_allreduce_microstep: 60.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 16:07:18,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.90 | bwd_microstep: 4986.23 | bwd_inner_microstep: 4966.86 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 16:07:27,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.98 | bwd_microstep: 4925.79 | bwd_inner_microstep: 4885.34 | bwd_allreduce_microstep: 40.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3820 [2024-07-31 16:07:35,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 16:07:35,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3233.06 | bwd_microstep: 4858.07 | bwd_inner_microstep: 4838.71 | bwd_allreduce_microstep: 19.28 | step_microstep: 181.17 [2024-07-31 16:07:35,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28024.80 | bwd: 40538.88 | bwd_inner: 39882.83 | bwd_allreduce: 655.54 | step: 181.76 58%|█████▊ | 709/1230 [13:55:41<10:05:35, 69.74s/it] {'loss': 1.1194, 'learning_rate': 8.02488941274078e-06, 'epoch': 0.58} 58%|█████▊ | 709/1230 [13:55:41<10:05:35, 69.74s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2460 [2024-07-31 16:07:44,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.92 | bwd_microstep: 5665.65 | bwd_inner_microstep: 5230.27 | bwd_allreduce_microstep: 435.31 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3841 [2024-07-31 16:07:53,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.33 | bwd_microstep: 5045.33 | bwd_inner_microstep: 5024.26 | bwd_allreduce_microstep: 21.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-07-31 16:08:01,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3057.23 | bwd_microstep: 5015.20 | bwd_inner_microstep: 4626.69 | bwd_allreduce_microstep: 388.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 16:08:10,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.15 | bwd_microstep: 4984.01 | bwd_inner_microstep: 4962.87 | bwd_allreduce_microstep: 21.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 16:08:19,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.25 | bwd_microstep: 5217.72 | bwd_inner_microstep: 4812.21 | bwd_allreduce_microstep: 405.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 16:08:28,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.74 | bwd_microstep: 5133.84 | bwd_inner_microstep: 4737.85 | bwd_allreduce_microstep: 395.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 16:08:35,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.51 | bwd_microstep: 4719.30 | bwd_inner_microstep: 4693.67 | bwd_allreduce_microstep: 25.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 16:08:44,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 16:08:44,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.06 | bwd_microstep: 4997.88 | bwd_inner_microstep: 4943.62 | bwd_allreduce_microstep: 54.18 | step_microstep: 182.29 [2024-07-31 16:08:44,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27980.08 | bwd: 40778.90 | bwd_inner: 39031.38 | bwd_allreduce: 1747.05 | step: 182.87 58%|█████▊ | 710/1230 [13:56:50<10:02:43, 69.55s/it] {'loss': 1.1563, 'learning_rate': 7.999081493280285e-06, 'epoch': 0.58} 58%|█████▊ | 710/1230 [13:56:50<10:02:43, 69.55s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3897 [2024-07-31 16:08:53,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3826.88 | bwd_microstep: 5213.03 | bwd_inner_microstep: 5182.39 | bwd_allreduce_microstep: 30.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3895 [2024-07-31 16:09:01,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3164.91 | bwd_microstep: 4990.48 | bwd_inner_microstep: 4952.28 | bwd_allreduce_microstep: 38.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-31 16:09:10,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.98 | bwd_microstep: 5117.84 | bwd_inner_microstep: 5045.81 | bwd_allreduce_microstep: 71.96 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3602 [2024-07-31 16:09:19,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.03 | bwd_microstep: 5104.84 | bwd_inner_microstep: 5015.35 | bwd_allreduce_microstep: 89.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 16:09:27,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.10 | bwd_microstep: 4872.37 | bwd_inner_microstep: 4852.98 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3705 [2024-07-31 16:09:36,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.27 | bwd_microstep: 5114.37 | bwd_inner_microstep: 5062.89 | bwd_allreduce_microstep: 51.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 16:09:45,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.40 | bwd_microstep: 4977.97 | bwd_inner_microstep: 4958.67 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 16:09:54,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 16:09:54,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.80 | bwd_microstep: 5058.60 | bwd_inner_microstep: 5000.45 | bwd_allreduce_microstep: 58.08 | step_microstep: 181.97 [2024-07-31 16:09:54,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28738.27 | bwd: 40449.49 | bwd_inner: 40070.76 | bwd_allreduce: 378.24 | step: 182.55 58%|█████▊ | 711/1230 [13:58:00<10:01:30, 69.54s/it] {'loss': 1.1356, 'learning_rate': 7.973287449299545e-06, 'epoch': 0.58} 58%|█████▊ | 711/1230 [13:58:00<10:01:30, 69.54s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3830 [2024-07-31 16:10:03,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3853.59 | bwd_microstep: 5472.47 | bwd_inner_microstep: 5399.77 | bwd_allreduce_microstep: 72.64 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-31 16:10:12,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.83 | bwd_microstep: 5418.90 | bwd_inner_microstep: 4999.11 | bwd_allreduce_microstep: 419.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-31 16:10:21,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.60 | bwd_microstep: 5257.65 | bwd_inner_microstep: 5192.72 | bwd_allreduce_microstep: 64.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 16:10:30,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.25 | bwd_microstep: 4998.94 | bwd_inner_microstep: 4979.65 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 15, images per sample: 7.5, dynamic token length: 2992 [2024-07-31 16:10:39,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.68 | bwd_microstep: 5194.63 | bwd_inner_microstep: 4790.24 | bwd_allreduce_microstep: 404.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 16:10:47,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.00 | bwd_microstep: 5140.51 | bwd_inner_microstep: 4742.52 | bwd_allreduce_microstep: 397.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 16:10:55,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3199.93 | bwd_microstep: 4673.89 | bwd_inner_microstep: 4653.84 | bwd_allreduce_microstep: 19.97 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 16:11:04,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 16:11:04,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.67 | bwd_microstep: 5014.65 | bwd_inner_microstep: 4958.73 | bwd_allreduce_microstep: 55.85 | step_microstep: 181.31 [2024-07-31 16:11:04,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28730.46 | bwd: 41171.62 | bwd_inner: 39716.53 | bwd_allreduce: 1454.61 | step: 181.90 58%|█████▊ | 712/1230 [13:59:10<10:02:09, 69.75s/it] {'loss': 1.1282, 'learning_rate': 7.947507459668782e-06, 'epoch': 0.58} 58%|█████▊ | 712/1230 [13:59:10<10:02:09, 69.75s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3857 [2024-07-31 16:11:13,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.55 | bwd_microstep: 5305.60 | bwd_inner_microstep: 5244.36 | bwd_allreduce_microstep: 61.18 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2801 [2024-07-31 16:11:22,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.86 | bwd_microstep: 5241.86 | bwd_inner_microstep: 4838.65 | bwd_allreduce_microstep: 403.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3775 [2024-07-31 16:11:31,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.38 | bwd_microstep: 5185.56 | bwd_inner_microstep: 5130.16 | bwd_allreduce_microstep: 55.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3878 [2024-07-31 16:11:39,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.47 | bwd_microstep: 5119.98 | bwd_inner_microstep: 5100.71 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3631 [2024-07-31 16:11:48,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.56 | bwd_microstep: 5183.10 | bwd_inner_microstep: 5089.98 | bwd_allreduce_microstep: 93.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3830 [2024-07-31 16:11:57,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.95 | bwd_microstep: 5049.62 | bwd_inner_microstep: 5030.25 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 16:12:06,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.26 | bwd_microstep: 5188.61 | bwd_inner_microstep: 5113.26 | bwd_allreduce_microstep: 75.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 16:12:15,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.79 [2024-07-31 16:12:15,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.77 | bwd_microstep: 4901.04 | bwd_inner_microstep: 4881.60 | bwd_allreduce_microstep: 19.37 | step_microstep: 181.47 [2024-07-31 16:12:15,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29213.71 | bwd: 41175.36 | bwd_inner: 40428.90 | bwd_allreduce: 745.97 | step: 182.17 58%|█████▊ | 713/1230 [14:00:21<10:03:30, 70.04s/it] {'loss': 1.1395, 'learning_rate': 7.921741703160758e-06, 'epoch': 0.58} 58%|█████▊ | 713/1230 [14:00:21<10:03:30, 70.04s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 4018 [2024-07-31 16:12:23,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3233.74 | bwd_microstep: 5144.92 | bwd_inner_microstep: 5115.93 | bwd_allreduce_microstep: 28.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3561 [2024-07-31 16:12:31,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3338.84 | bwd_microstep: 5044.45 | bwd_inner_microstep: 4976.68 | bwd_allreduce_microstep: 67.68 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3750 [2024-07-31 16:12:40,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.25 | bwd_microstep: 5188.81 | bwd_inner_microstep: 5126.77 | bwd_allreduce_microstep: 61.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-31 16:12:48,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3262.83 | bwd_microstep: 4828.65 | bwd_inner_microstep: 4806.34 | bwd_allreduce_microstep: 22.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3757 [2024-07-31 16:12:57,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.11 | bwd_microstep: 5009.08 | bwd_inner_microstep: 4989.82 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 16:13:06,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.02 | bwd_microstep: 5015.61 | bwd_inner_microstep: 4955.30 | bwd_allreduce_microstep: 60.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 16:13:14,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.54 | bwd_microstep: 4971.85 | bwd_inner_microstep: 4922.82 | bwd_allreduce_microstep: 48.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 16:13:23,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 16:13:23,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.92 | bwd_microstep: 5048.65 | bwd_inner_microstep: 4658.13 | bwd_allreduce_microstep: 390.45 | step_microstep: 181.96 [2024-07-31 16:13:23,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27773.15 | bwd: 40252.00 | bwd_inner: 39551.73 | bwd_allreduce: 699.76 | step: 182.55 58%|█████▊ | 714/1230 [14:01:29<9:57:59, 69.53s/it] {'loss': 1.1488, 'learning_rate': 7.895990358449532e-06, 'epoch': 0.58} 58%|█████▊ | 714/1230 [14:01:29<9:57:59, 69.53s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3843 [2024-07-31 16:13:32,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3816.63 | bwd_microstep: 5111.83 | bwd_inner_microstep: 5092.65 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3943 [2024-07-31 16:13:41,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3822.49 | bwd_microstep: 5184.34 | bwd_inner_microstep: 5164.28 | bwd_allreduce_microstep: 19.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-31 16:13:49,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3270.25 | bwd_microstep: 4931.52 | bwd_inner_microstep: 4897.50 | bwd_allreduce_microstep: 33.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-07-31 16:13:58,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.76 | bwd_microstep: 5119.70 | bwd_inner_microstep: 5075.13 | bwd_allreduce_microstep: 44.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-31 16:14:06,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.08 | bwd_microstep: 4830.07 | bwd_inner_microstep: 4810.71 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2076 [2024-07-31 16:14:15,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.91 | bwd_microstep: 5069.29 | bwd_inner_microstep: 4677.65 | bwd_allreduce_microstep: 391.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-31 16:14:23,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.43 | bwd_microstep: 5115.33 | bwd_inner_microstep: 4717.66 | bwd_allreduce_microstep: 397.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 16:14:32,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 16:14:32,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.76 | bwd_microstep: 5062.05 | bwd_inner_microstep: 5002.99 | bwd_allreduce_microstep: 58.99 | step_microstep: 181.21 [2024-07-31 16:14:32,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28312.21 | bwd: 40424.11 | bwd_inner: 39438.52 | bwd_allreduce: 985.10 | step: 181.79 58%|█████▊ | 715/1230 [14:02:38<9:55:38, 69.39s/it] {'loss': 1.1662, 'learning_rate': 7.870253604109222e-06, 'epoch': 0.58} 58%|█████▊ | 715/1230 [14:02:38<9:55:38, 69.39s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 4096 [2024-07-31 16:14:41,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.61 | bwd_microstep: 5452.33 | bwd_inner_microstep: 5394.86 | bwd_allreduce_microstep: 57.40 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3786 [2024-07-31 16:14:50,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.99 | bwd_microstep: 5372.07 | bwd_inner_microstep: 5282.80 | bwd_allreduce_microstep: 89.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 16:14:59,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.60 | bwd_microstep: 5160.00 | bwd_inner_microstep: 5084.71 | bwd_allreduce_microstep: 75.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 16:15:07,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3200.48 | bwd_microstep: 4694.86 | bwd_inner_microstep: 4673.82 | bwd_allreduce_microstep: 20.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-31 16:15:15,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.02 | bwd_microstep: 5197.10 | bwd_inner_microstep: 4794.64 | bwd_allreduce_microstep: 402.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 16:15:24,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.23 | bwd_microstep: 5068.38 | bwd_inner_microstep: 5024.94 | bwd_allreduce_microstep: 43.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-31 16:15:33,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.46 | bwd_microstep: 5080.62 | bwd_inner_microstep: 5037.71 | bwd_allreduce_microstep: 42.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-07-31 16:15:41,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 16:15:41,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3018.40 | bwd_microstep: 4927.94 | bwd_inner_microstep: 4551.60 | bwd_allreduce_microstep: 376.26 | step_microstep: 181.24 [2024-07-31 16:15:41,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27637.69 | bwd: 40953.28 | bwd_inner: 39845.02 | bwd_allreduce: 1107.78 | step: 181.82 58%|█████▊ | 716/1230 [14:03:47<9:53:15, 69.25s/it] {'loss': 1.1319, 'learning_rate': 7.844531618612772e-06, 'epoch': 0.58} 58%|█████▊ | 716/1230 [14:03:47<9:53:15, 69.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3578 [2024-07-31 16:15:50,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.73 | bwd_microstep: 5447.22 | bwd_inner_microstep: 5288.15 | bwd_allreduce_microstep: 159.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 16:15:59,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3813.04 | bwd_microstep: 5361.24 | bwd_inner_microstep: 5293.04 | bwd_allreduce_microstep: 68.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2281 [2024-07-31 16:16:08,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.40 | bwd_microstep: 5184.87 | bwd_inner_microstep: 4780.33 | bwd_allreduce_microstep: 404.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 16:16:17,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.19 | bwd_microstep: 4974.59 | bwd_inner_microstep: 4955.17 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-31 16:16:26,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.37 | bwd_microstep: 5039.58 | bwd_inner_microstep: 5012.92 | bwd_allreduce_microstep: 26.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 16:16:34,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.88 | bwd_microstep: 5222.30 | bwd_inner_microstep: 4816.35 | bwd_allreduce_microstep: 405.88 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 16:16:42,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3192.12 | bwd_microstep: 4730.71 | bwd_inner_microstep: 4709.67 | bwd_allreduce_microstep: 20.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 16:16:51,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 16:16:51,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.87 | bwd_microstep: 4991.02 | bwd_inner_microstep: 4937.73 | bwd_allreduce_microstep: 53.22 | step_microstep: 181.56 [2024-07-31 16:16:51,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28742.51 | bwd: 40951.50 | bwd_inner: 39793.31 | bwd_allreduce: 1157.68 | step: 182.14 58%|█████▊ | 717/1230 [14:04:57<9:54:05, 69.48s/it] {'loss': 1.1693, 'learning_rate': 7.81882458033071e-06, 'epoch': 0.58} 58%|█████▊ | 717/1230 [14:04:57<9:54:05, 69.48s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4020 [2024-07-31 16:17:00,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.29 | bwd_microstep: 5612.89 | bwd_inner_microstep: 5540.67 | bwd_allreduce_microstep: 72.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3571 [2024-07-31 16:17:09,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3225.17 | bwd_microstep: 5127.79 | bwd_inner_microstep: 5048.61 | bwd_allreduce_microstep: 79.11 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-31 16:17:18,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.90 | bwd_microstep: 5157.77 | bwd_inner_microstep: 5077.67 | bwd_allreduce_microstep: 80.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3786 [2024-07-31 16:17:26,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.01 | bwd_microstep: 5030.98 | bwd_inner_microstep: 5011.67 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3834 [2024-07-31 16:17:34,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3228.87 | bwd_microstep: 4861.97 | bwd_inner_microstep: 4842.65 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 16:17:42,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3217.26 | bwd_microstep: 4805.99 | bwd_inner_microstep: 4786.53 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 16:17:51,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.67 | bwd_microstep: 4892.84 | bwd_inner_microstep: 4873.48 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 16:18:00,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 16:18:00,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.79 | bwd_microstep: 4995.40 | bwd_inner_microstep: 4941.28 | bwd_allreduce_microstep: 54.05 | step_microstep: 181.16 [2024-07-31 16:18:00,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27964.87 | bwd: 40485.61 | bwd_inner: 40122.51 | bwd_allreduce: 362.60 | step: 181.84 58%|█████▊ | 718/1230 [14:06:06<9:51:08, 69.27s/it] {'loss': 1.1639, 'learning_rate': 7.79313266752991e-06, 'epoch': 0.58} 58%|█████▊ | 718/1230 [14:06:06<9:51:08, 69.27s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2306 [2024-07-31 16:18:09,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.52 | bwd_microstep: 5199.32 | bwd_inner_microstep: 4800.21 | bwd_allreduce_microstep: 399.05 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-07-31 16:18:17,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.53 | bwd_microstep: 5192.32 | bwd_inner_microstep: 5105.40 | bwd_allreduce_microstep: 86.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 16:18:26,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.19 | bwd_microstep: 5002.33 | bwd_inner_microstep: 4983.03 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-31 16:18:35,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.85 | bwd_microstep: 5215.71 | bwd_inner_microstep: 4811.59 | bwd_allreduce_microstep: 404.06 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 16:18:43,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3340.54 | bwd_microstep: 4818.64 | bwd_inner_microstep: 4791.94 | bwd_allreduce_microstep: 26.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 16:18:51,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2990.43 | bwd_microstep: 4835.69 | bwd_inner_microstep: 4464.89 | bwd_allreduce_microstep: 370.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2131 [2024-07-31 16:19:00,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.51 | bwd_microstep: 5100.41 | bwd_inner_microstep: 4705.39 | bwd_allreduce_microstep: 394.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 16:19:08,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 16:19:08,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.70 | bwd_microstep: 5061.56 | bwd_inner_microstep: 4999.71 | bwd_allreduce_microstep: 61.78 | step_microstep: 182.75 [2024-07-31 16:19:08,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27844.17 | bwd: 40425.97 | bwd_inner: 38662.08 | bwd_allreduce: 1763.40 | step: 183.33 58%|█████▊ | 719/1230 [14:07:14<9:48:15, 69.07s/it] {'loss': 1.2031, 'learning_rate': 7.767456058372362e-06, 'epoch': 0.58} 58%|█████▊ | 719/1230 [14:07:14<9:48:15, 69.07s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2340 [2024-07-31 16:19:17,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.13 | bwd_microstep: 5412.14 | bwd_inner_microstep: 4999.63 | bwd_allreduce_microstep: 412.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3846 [2024-07-31 16:19:26,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3781.88 | bwd_microstep: 5087.04 | bwd_inner_microstep: 5067.60 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2059 [2024-07-31 16:19:35,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.63 | bwd_microstep: 5287.12 | bwd_inner_microstep: 4877.60 | bwd_allreduce_microstep: 409.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 16:19:44,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.82 | bwd_microstep: 5159.90 | bwd_inner_microstep: 5104.92 | bwd_allreduce_microstep: 54.91 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3727 [2024-07-31 16:19:52,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3129.60 | bwd_microstep: 4870.10 | bwd_inner_microstep: 4836.60 | bwd_allreduce_microstep: 33.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 16:20:01,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.60 | bwd_microstep: 5160.78 | bwd_inner_microstep: 4758.67 | bwd_allreduce_microstep: 402.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 16:20:09,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.10 | bwd_microstep: 5060.06 | bwd_inner_microstep: 4998.85 | bwd_allreduce_microstep: 61.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 16:20:18,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 16:20:18,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.91 | bwd_microstep: 5069.59 | bwd_inner_microstep: 4676.15 | bwd_allreduce_microstep: 393.37 | step_microstep: 181.40 [2024-07-31 16:20:18,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28229.57 | bwd: 41106.72 | bwd_inner: 39319.96 | bwd_allreduce: 1786.25 | step: 181.97 59%|█████▊ | 720/1230 [14:08:24<9:48:37, 69.25s/it] {'loss': 1.1188, 'learning_rate': 7.74179493091392e-06, 'epoch': 0.59} 59%|█████▊ | 720/1230 [14:08:24<9:48:37, 69.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4041 [2024-07-31 16:20:27,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.46 | bwd_microstep: 5167.74 | bwd_inner_microstep: 5148.72 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-31 16:20:35,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3096.74 | bwd_microstep: 5190.22 | bwd_inner_microstep: 4791.26 | bwd_allreduce_microstep: 398.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 16:20:44,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.37 | bwd_microstep: 5172.40 | bwd_inner_microstep: 5092.92 | bwd_allreduce_microstep: 79.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 16:20:53,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.84 | bwd_microstep: 5162.85 | bwd_inner_microstep: 5078.81 | bwd_allreduce_microstep: 83.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 16:21:02,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.43 | bwd_microstep: 5030.35 | bwd_inner_microstep: 5005.57 | bwd_allreduce_microstep: 24.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 16:21:10,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.43 | bwd_microstep: 5071.07 | bwd_inner_microstep: 5008.23 | bwd_allreduce_microstep: 62.77 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2151 [2024-07-31 16:21:18,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3002.28 | bwd_microstep: 4966.93 | bwd_inner_microstep: 4585.31 | bwd_allreduce_microstep: 381.56 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 16:21:27,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 16:21:27,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.71 | bwd_microstep: 5003.62 | bwd_inner_microstep: 4981.26 | bwd_allreduce_microstep: 22.29 | step_microstep: 181.36 [2024-07-31 16:21:27,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27923.17 | bwd: 40765.17 | bwd_inner: 39692.03 | bwd_allreduce: 1072.66 | step: 181.95 59%|█████▊ | 721/1230 [14:09:33<9:46:53, 69.18s/it] {'loss': 1.1507, 'learning_rate': 7.716149463103097e-06, 'epoch': 0.59} 59%|█████▊ | 721/1230 [14:09:33<9:46:53, 69.18s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3862 [2024-07-31 16:21:36,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.34 | bwd_microstep: 5428.71 | bwd_inner_microstep: 5325.97 | bwd_allreduce_microstep: 102.67 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3563 [2024-07-31 16:21:45,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3344.15 | bwd_microstep: 5064.31 | bwd_inner_microstep: 4994.77 | bwd_allreduce_microstep: 69.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2215 [2024-07-31 16:21:53,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.97 | bwd_microstep: 5173.05 | bwd_inner_microstep: 4771.23 | bwd_allreduce_microstep: 401.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 16:22:02,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.41 | bwd_microstep: 4983.52 | bwd_inner_microstep: 4948.32 | bwd_allreduce_microstep: 35.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 16:22:11,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.86 | bwd_microstep: 5178.44 | bwd_inner_microstep: 4776.08 | bwd_allreduce_microstep: 402.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 16:22:19,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.94 | bwd_microstep: 5009.77 | bwd_inner_microstep: 4955.42 | bwd_allreduce_microstep: 54.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 16:22:27,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3176.91 | bwd_microstep: 4675.95 | bwd_inner_microstep: 4651.58 | bwd_allreduce_microstep: 24.31 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2141 [2024-07-31 16:22:36,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 16:22:36,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.44 | bwd_microstep: 5036.23 | bwd_inner_microstep: 4645.11 | bwd_allreduce_microstep: 391.03 | step_microstep: 210.14 [2024-07-31 16:22:36,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28013.92 | bwd: 40549.96 | bwd_inner: 39068.41 | bwd_allreduce: 1481.04 | step: 210.71 59%|█████▊ | 722/1230 [14:10:42<9:45:04, 69.10s/it] {'loss': 1.1559, 'learning_rate': 7.6905198327798e-06, 'epoch': 0.59} 59%|█████▊ | 722/1230 [14:10:42<9:45:04, 69.10s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 4058 [2024-07-31 16:22:45,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.87 | bwd_microstep: 5201.99 | bwd_inner_microstep: 5169.44 | bwd_allreduce_microstep: 32.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3996 [2024-07-31 16:22:54,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.23 | bwd_microstep: 5105.19 | bwd_inner_microstep: 5083.97 | bwd_allreduce_microstep: 21.15 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 16:23:03,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.36 | bwd_microstep: 5121.35 | bwd_inner_microstep: 5047.22 | bwd_allreduce_microstep: 74.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 16:23:11,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.99 | bwd_microstep: 5178.06 | bwd_inner_microstep: 5120.61 | bwd_allreduce_microstep: 57.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 16:23:20,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.04 | bwd_microstep: 5242.59 | bwd_inner_microstep: 4835.77 | bwd_allreduce_microstep: 406.75 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 16:23:29,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3282.59 | bwd_microstep: 5087.79 | bwd_inner_microstep: 4693.48 | bwd_allreduce_microstep: 394.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 16:23:37,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.70 | bwd_microstep: 4978.07 | bwd_inner_microstep: 4922.22 | bwd_allreduce_microstep: 55.78 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 16:23:46,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 16:23:46,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.00 | bwd_microstep: 5130.22 | bwd_inner_microstep: 5059.37 | bwd_allreduce_microstep: 70.78 | step_microstep: 182.51 [2024-07-31 16:23:46,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28576.67 | bwd: 41045.24 | bwd_inner: 39932.04 | bwd_allreduce: 1112.72 | step: 183.21 59%|█████▉ | 723/1230 [14:11:52<9:46:04, 69.36s/it] {'loss': 1.1723, 'learning_rate': 7.664906217674111e-06, 'epoch': 0.59} 59%|█████▉ | 723/1230 [14:11:52<9:46:04, 69.36s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4008 [2024-07-31 16:23:55,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.38 | bwd_microstep: 5504.55 | bwd_inner_microstep: 5441.44 | bwd_allreduce_microstep: 63.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2257 [2024-07-31 16:24:04,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.32 | bwd_microstep: 5265.51 | bwd_inner_microstep: 4858.81 | bwd_allreduce_microstep: 406.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3764 [2024-07-31 16:24:13,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3381.11 | bwd_microstep: 5073.92 | bwd_inner_microstep: 5035.17 | bwd_allreduce_microstep: 38.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 16:24:21,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.25 | bwd_microstep: 4978.79 | bwd_inner_microstep: 4959.42 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 16:24:30,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.09 | bwd_microstep: 4889.50 | bwd_inner_microstep: 4870.17 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3635 [2024-07-31 16:24:38,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.82 | bwd_microstep: 4931.87 | bwd_inner_microstep: 4897.65 | bwd_allreduce_microstep: 34.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 16:24:47,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.34 | bwd_microstep: 4929.60 | bwd_inner_microstep: 4901.56 | bwd_allreduce_microstep: 27.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-31 16:24:56,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 16:24:56,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.28 | bwd_microstep: 5094.10 | bwd_inner_microstep: 4699.54 | bwd_allreduce_microstep: 394.49 | step_microstep: 213.27 [2024-07-31 16:24:56,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28914.49 | bwd: 40667.84 | bwd_inner: 39663.69 | bwd_allreduce: 1003.65 | step: 213.86 59%|█████▉ | 724/1230 [14:13:02<9:46:24, 69.54s/it] {'loss': 1.152, 'learning_rate': 7.639308795405066e-06, 'epoch': 0.59} 59%|█████▉ | 724/1230 [14:13:02<9:46:24, 69.54s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3853 [2024-07-31 16:25:05,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3850.02 | bwd_microstep: 5469.07 | bwd_inner_microstep: 5397.76 | bwd_allreduce_microstep: 71.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2222 [2024-07-31 16:25:14,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.71 | bwd_microstep: 5245.08 | bwd_inner_microstep: 4836.83 | bwd_allreduce_microstep: 408.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3591 [2024-07-31 16:25:23,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.09 | bwd_microstep: 5125.68 | bwd_inner_microstep: 5053.87 | bwd_allreduce_microstep: 71.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-31 16:25:32,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.28 | bwd_microstep: 5005.80 | bwd_inner_microstep: 4986.51 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 16:25:40,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.46 | bwd_microstep: 5002.79 | bwd_inner_microstep: 4979.65 | bwd_allreduce_microstep: 23.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 16:25:49,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.89 | bwd_microstep: 5194.20 | bwd_inner_microstep: 4787.66 | bwd_allreduce_microstep: 406.47 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1621 [2024-07-31 16:25:58,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.77 | bwd_microstep: 5321.13 | bwd_inner_microstep: 4878.44 | bwd_allreduce_microstep: 442.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 16:26:07,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 16:26:07,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.99 | bwd_microstep: 4918.26 | bwd_inner_microstep: 4892.55 | bwd_allreduce_microstep: 25.65 | step_microstep: 181.72 [2024-07-31 16:26:07,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29279.12 | bwd: 41282.00 | bwd_inner: 39813.20 | bwd_allreduce: 1468.31 | step: 182.29 59%|█████▉ | 725/1230 [14:14:13<9:48:40, 69.94s/it] {'loss': 1.125, 'learning_rate': 7.613727743479395e-06, 'epoch': 0.59} 59%|█████▉ | 725/1230 [14:14:13<9:48:40, 69.94s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2305 [2024-07-31 16:26:16,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.00 | bwd_microstep: 5320.17 | bwd_inner_microstep: 4914.76 | bwd_allreduce_microstep: 405.34 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-07-31 16:26:25,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.92 | bwd_microstep: 5339.60 | bwd_inner_microstep: 4925.75 | bwd_allreduce_microstep: 413.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 16:26:33,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.16 | bwd_microstep: 5181.97 | bwd_inner_microstep: 5100.53 | bwd_allreduce_microstep: 81.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-31 16:26:42,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.71 | bwd_microstep: 5106.32 | bwd_inner_microstep: 5060.56 | bwd_allreduce_microstep: 45.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 16:26:51,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.80 | bwd_microstep: 5154.82 | bwd_inner_microstep: 4755.11 | bwd_allreduce_microstep: 399.65 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 16:27:00,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.54 | bwd_microstep: 5123.22 | bwd_inner_microstep: 4724.00 | bwd_allreduce_microstep: 399.16 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 16:27:08,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.55 | bwd_microstep: 4979.20 | bwd_inner_microstep: 4929.91 | bwd_allreduce_microstep: 49.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 16:27:17,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.90 [2024-07-31 16:27:17,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.40 | bwd_microstep: 4875.90 | bwd_inner_microstep: 4856.59 | bwd_allreduce_microstep: 19.24 | step_microstep: 181.88 [2024-07-31 16:27:17,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28629.97 | bwd: 41081.19 | bwd_inner: 39267.14 | bwd_allreduce: 1813.56 | step: 182.50 59%|█████▉ | 726/1230 [14:15:23<9:47:45, 69.97s/it] {'loss': 1.1368, 'learning_rate': 7.588163239290316e-06, 'epoch': 0.59} 59%|█████▉ | 726/1230 [14:15:23<9:47:45, 69.97s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4024 [2024-07-31 16:27:26,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3850.74 | bwd_microstep: 5266.37 | bwd_inner_microstep: 5247.30 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3809 [2024-07-31 16:27:35,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.22 | bwd_microstep: 5046.91 | bwd_inner_microstep: 5027.56 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-31 16:27:44,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.25 | bwd_microstep: 5150.45 | bwd_inner_microstep: 5071.32 | bwd_allreduce_microstep: 79.06 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2233 [2024-07-31 16:27:52,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.03 | bwd_microstep: 5230.38 | bwd_inner_microstep: 4824.96 | bwd_allreduce_microstep: 405.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-31 16:28:01,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.20 | bwd_microstep: 5043.26 | bwd_inner_microstep: 5013.67 | bwd_allreduce_microstep: 29.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 16:28:10,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.46 | bwd_microstep: 5155.80 | bwd_inner_microstep: 5105.23 | bwd_allreduce_microstep: 50.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-31 16:28:19,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.94 | bwd_microstep: 4992.21 | bwd_inner_microstep: 4939.87 | bwd_allreduce_microstep: 52.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 16:28:27,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 16:28:27,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.95 | bwd_microstep: 4960.01 | bwd_inner_microstep: 4913.14 | bwd_allreduce_microstep: 46.80 | step_microstep: 181.49 [2024-07-31 16:28:27,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29204.71 | bwd: 40845.37 | bwd_inner: 40143.00 | bwd_allreduce: 701.87 | step: 182.17 59%|█████▉ | 727/1230 [14:16:33<9:47:37, 70.10s/it] {'loss': 1.166, 'learning_rate': 7.5626154601162874e-06, 'epoch': 0.59} 59%|█████▉ | 727/1230 [14:16:33<9:47:37, 70.10s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3961 [2024-07-31 16:28:36,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.40 | bwd_microstep: 5456.02 | bwd_inner_microstep: 5363.05 | bwd_allreduce_microstep: 92.90 | step_microstep: 0.10 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2271 [2024-07-31 16:28:45,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.91 | bwd_microstep: 5158.47 | bwd_inner_microstep: 4756.56 | bwd_allreduce_microstep: 401.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-31 16:28:54,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.31 | bwd_microstep: 5074.25 | bwd_inner_microstep: 5044.15 | bwd_allreduce_microstep: 30.03 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-31 16:29:03,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.37 | bwd_microstep: 4990.49 | bwd_inner_microstep: 4970.96 | bwd_allreduce_microstep: 19.46 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2080 [2024-07-31 16:29:11,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.35 | bwd_microstep: 5073.42 | bwd_inner_microstep: 4679.26 | bwd_allreduce_microstep: 394.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 16:29:20,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.44 | bwd_microstep: 4885.99 | bwd_inner_microstep: 4866.63 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 16:29:28,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3200.77 | bwd_microstep: 4718.26 | bwd_inner_microstep: 4697.92 | bwd_allreduce_microstep: 20.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2133 [2024-07-31 16:29:37,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 16:29:37,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3480.38 | bwd_microstep: 5067.68 | bwd_inner_microstep: 4674.43 | bwd_allreduce_microstep: 393.18 | step_microstep: 181.92 [2024-07-31 16:29:37,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28550.84 | bwd: 40424.57 | bwd_inner: 39052.90 | bwd_allreduce: 1371.18 | step: 182.53 59%|█████▉ | 728/1230 [14:17:42<9:44:28, 69.86s/it] {'loss': 1.1855, 'learning_rate': 7.537084583119802e-06, 'epoch': 0.59} 59%|█████▉ | 728/1230 [14:17:42<9:44:28, 69.86s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2358 [2024-07-31 16:29:46,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.10 | bwd_microstep: 6105.91 | bwd_inner_microstep: 5401.64 | bwd_allreduce_microstep: 704.21 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3435 [2024-07-31 16:29:55,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.17 | bwd_microstep: 5311.91 | bwd_inner_microstep: 5132.32 | bwd_allreduce_microstep: 179.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3811 [2024-07-31 16:30:04,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.03 | bwd_microstep: 5071.47 | bwd_inner_microstep: 5047.22 | bwd_allreduce_microstep: 24.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3839 [2024-07-31 16:30:12,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3246.59 | bwd_microstep: 4853.49 | bwd_inner_microstep: 4834.12 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 16:30:21,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.18 | bwd_microstep: 5022.96 | bwd_inner_microstep: 4967.60 | bwd_allreduce_microstep: 55.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-31 16:30:29,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.27 | bwd_microstep: 5050.44 | bwd_inner_microstep: 4996.01 | bwd_allreduce_microstep: 54.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 16:30:38,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.45 | bwd_microstep: 4978.82 | bwd_inner_microstep: 4959.55 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-31 16:30:47,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 16:30:47,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.87 | bwd_microstep: 5015.81 | bwd_inner_microstep: 4959.88 | bwd_allreduce_microstep: 55.86 | step_microstep: 184.85 [2024-07-31 16:30:47,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28639.56 | bwd: 41410.80 | bwd_inner: 40298.27 | bwd_allreduce: 1112.05 | step: 185.44 59%|█████▉ | 729/1230 [14:18:53<9:44:37, 70.02s/it] {'loss': 1.115, 'learning_rate': 7.51157078534613e-06, 'epoch': 0.59} 59%|█████▉ | 729/1230 [14:18:53<9:44:37, 70.02s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4003 [2024-07-31 16:30:56,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3830.21 | bwd_microstep: 5307.26 | bwd_inner_microstep: 5288.23 | bwd_allreduce_microstep: 18.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 16:31:04,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.60 | bwd_microstep: 4874.52 | bwd_inner_microstep: 4855.30 | bwd_allreduce_microstep: 19.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3769 [2024-07-31 16:31:13,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.44 | bwd_microstep: 5007.46 | bwd_inner_microstep: 4988.14 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2244 [2024-07-31 16:31:22,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.18 | bwd_microstep: 5183.69 | bwd_inner_microstep: 4778.11 | bwd_allreduce_microstep: 405.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-31 16:31:30,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.91 | bwd_microstep: 5118.84 | bwd_inner_microstep: 5049.37 | bwd_allreduce_microstep: 69.40 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2125 [2024-07-31 16:31:39,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.98 | bwd_microstep: 5240.07 | bwd_inner_microstep: 4833.15 | bwd_allreduce_microstep: 406.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-31 16:31:48,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.49 | bwd_microstep: 5027.45 | bwd_inner_microstep: 4967.53 | bwd_allreduce_microstep: 59.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 16:31:57,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 16:31:57,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.57 | bwd_microstep: 5054.85 | bwd_inner_microstep: 4996.24 | bwd_allreduce_microstep: 58.55 | step_microstep: 181.49 [2024-07-31 16:31:57,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28659.29 | bwd: 40814.14 | bwd_inner: 39756.01 | bwd_allreduce: 1057.63 | step: 182.07 59%|█████▉ | 730/1230 [14:20:03<9:42:55, 69.95s/it] {'loss': 1.2038, 'learning_rate': 7.486074243722111e-06, 'epoch': 0.59} 59%|█████▉ | 730/1230 [14:20:03<9:42:55, 69.95s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2351 [2024-07-31 16:32:06,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.68 | bwd_microstep: 5666.54 | bwd_inner_microstep: 5230.62 | bwd_allreduce_microstep: 435.86 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2201 [2024-07-31 16:32:15,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.75 | bwd_microstep: 5299.28 | bwd_inner_microstep: 4890.42 | bwd_allreduce_microstep: 408.80 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2061 [2024-07-31 16:32:24,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.44 | bwd_microstep: 5212.03 | bwd_inner_microstep: 4807.43 | bwd_allreduce_microstep: 404.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3618 [2024-07-31 16:32:33,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.74 | bwd_microstep: 5199.29 | bwd_inner_microstep: 5103.31 | bwd_allreduce_microstep: 95.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-31 16:32:41,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.65 | bwd_microstep: 5188.87 | bwd_inner_microstep: 4785.31 | bwd_allreduce_microstep: 403.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-31 16:32:50,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.53 | bwd_microstep: 5001.73 | bwd_inner_microstep: 4982.41 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-31 16:32:59,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.41 | bwd_microstep: 5002.63 | bwd_inner_microstep: 4983.10 | bwd_allreduce_microstep: 19.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-31 16:33:08,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 16:33:08,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.93 | bwd_microstep: 5131.90 | bwd_inner_microstep: 4734.20 | bwd_allreduce_microstep: 397.63 | step_microstep: 182.90 [2024-07-31 16:33:08,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29002.03 | bwd: 41702.26 | bwd_inner: 39516.73 | bwd_allreduce: 2185.03 | step: 183.50 59%|█████▉ | 731/1230 [14:21:14<9:44:27, 70.28s/it] {'loss': 1.2054, 'learning_rate': 7.460595135054914e-06, 'epoch': 0.59} 59%|█████▉ | 731/1230 [14:21:14<9:44:27, 70.28s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3873 [2024-07-31 16:33:17,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.01 | bwd_microstep: 5158.33 | bwd_inner_microstep: 5134.06 | bwd_allreduce_microstep: 24.21 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3570 [2024-07-31 16:33:26,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.10 | bwd_microstep: 5146.94 | bwd_inner_microstep: 5074.76 | bwd_allreduce_microstep: 72.12 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-31 16:33:34,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.58 | bwd_microstep: 5001.86 | bwd_inner_microstep: 4982.50 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 16:33:43,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3341.18 | bwd_microstep: 4906.76 | bwd_inner_microstep: 4866.68 | bwd_allreduce_microstep: 40.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-31 16:33:51,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.23 | bwd_microstep: 5057.78 | bwd_inner_microstep: 4665.57 | bwd_allreduce_microstep: 392.14 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 16:33:59,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3004.71 | bwd_microstep: 4871.93 | bwd_inner_microstep: 4495.80 | bwd_allreduce_microstep: 376.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2124 [2024-07-31 16:34:08,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.62 | bwd_microstep: 5065.19 | bwd_inner_microstep: 4672.45 | bwd_allreduce_microstep: 392.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 16:34:16,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 16:34:16,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3201.88 | bwd_microstep: 4737.39 | bwd_inner_microstep: 4710.68 | bwd_allreduce_microstep: 26.65 | step_microstep: 183.25 [2024-07-31 16:34:16,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27641.21 | bwd: 39946.18 | bwd_inner: 38602.44 | bwd_allreduce: 1343.25 | step: 183.94 60%|█████▉ | 732/1230 [14:22:22<9:37:25, 69.57s/it] {'loss': 1.1364, 'learning_rate': 7.435133636030831e-06, 'epoch': 0.6} 60%|█████▉ | 732/1230 [14:22:22<9:37:25, 69.57s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3951 [2024-07-31 16:34:25,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.61 | bwd_microstep: 5606.52 | bwd_inner_microstep: 5509.48 | bwd_allreduce_microstep: 96.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3981 [2024-07-31 16:34:34,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.40 | bwd_microstep: 5364.11 | bwd_inner_microstep: 5316.90 | bwd_allreduce_microstep: 47.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2207 [2024-07-31 16:34:43,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.02 | bwd_microstep: 5280.29 | bwd_inner_microstep: 4871.33 | bwd_allreduce_microstep: 408.89 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 16:34:52,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.21 | bwd_microstep: 5153.72 | bwd_inner_microstep: 5077.49 | bwd_allreduce_microstep: 76.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-31 16:35:00,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3008.39 | bwd_microstep: 4874.06 | bwd_inner_microstep: 4498.97 | bwd_allreduce_microstep: 375.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 16:35:08,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.18 | bwd_microstep: 5133.50 | bwd_inner_microstep: 4737.65 | bwd_allreduce_microstep: 395.77 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-31 16:35:17,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.20 | bwd_microstep: 5070.62 | bwd_inner_microstep: 4678.65 | bwd_allreduce_microstep: 391.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 16:35:26,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 16:35:26,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.28 | bwd_microstep: 5376.46 | bwd_inner_microstep: 5199.63 | bwd_allreduce_microstep: 176.77 | step_microstep: 182.30 [2024-07-31 16:35:26,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28234.18 | bwd: 41859.27 | bwd_inner: 39890.04 | bwd_allreduce: 1968.75 | step: 182.92 60%|█████▉ | 733/1230 [14:23:32<9:38:23, 69.83s/it] {'loss': 1.1184, 'learning_rate': 7.4096899232140295e-06, 'epoch': 0.6} 60%|█████▉ | 733/1230 [14:23:32<9:38:23, 69.83s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3874 [2024-07-31 16:35:35,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3822.33 | bwd_microstep: 5159.22 | bwd_inner_microstep: 5133.91 | bwd_allreduce_microstep: 25.25 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2263 [2024-07-31 16:35:44,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.79 | bwd_microstep: 5518.41 | bwd_inner_microstep: 5092.75 | bwd_allreduce_microstep: 425.59 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-07-31 16:35:52,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3261.92 | bwd_microstep: 4855.12 | bwd_inner_microstep: 4835.85 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3814 [2024-07-31 16:36:01,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.97 | bwd_microstep: 5044.83 | bwd_inner_microstep: 5024.90 | bwd_allreduce_microstep: 19.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2234 [2024-07-31 16:36:10,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.24 | bwd_microstep: 5246.13 | bwd_inner_microstep: 4836.72 | bwd_allreduce_microstep: 409.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 16:36:19,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.34 | bwd_microstep: 5170.44 | bwd_inner_microstep: 5091.46 | bwd_allreduce_microstep: 78.91 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3767 [2024-07-31 16:36:28,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.67 | bwd_microstep: 5020.32 | bwd_inner_microstep: 5000.97 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 16:36:36,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-31 16:36:36,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3180.60 | bwd_microstep: 4723.75 | bwd_inner_microstep: 4700.78 | bwd_allreduce_microstep: 22.90 | step_microstep: 181.45 [2024-07-31 16:36:36,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28529.75 | bwd: 40738.21 | bwd_inner: 39717.28 | bwd_allreduce: 1020.45 | step: 182.04 60%|█████▉ | 734/1230 [14:24:42<9:36:40, 69.76s/it] {'loss': 1.1973, 'learning_rate': 7.384264173045335e-06, 'epoch': 0.6} 60%|█████▉ | 734/1230 [14:24:42<9:36:40, 69.76s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3908 [2024-07-31 16:36:45,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3826.51 | bwd_microstep: 5205.97 | bwd_inner_microstep: 5186.91 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 16:36:54,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.76 | bwd_microstep: 5349.59 | bwd_inner_microstep: 4934.30 | bwd_allreduce_microstep: 415.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-31 16:37:03,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.74 | bwd_microstep: 5205.58 | bwd_inner_microstep: 5163.44 | bwd_allreduce_microstep: 42.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 16:37:12,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.13 | bwd_microstep: 5143.82 | bwd_inner_microstep: 5088.87 | bwd_allreduce_microstep: 54.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-31 16:37:20,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.70 | bwd_microstep: 5125.72 | bwd_inner_microstep: 5071.72 | bwd_allreduce_microstep: 53.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3805 [2024-07-31 16:37:29,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.55 | bwd_microstep: 5049.29 | bwd_inner_microstep: 5029.98 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 16:37:38,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.76 | bwd_microstep: 4998.39 | bwd_inner_microstep: 4941.76 | bwd_allreduce_microstep: 56.56 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3660 [2024-07-31 16:37:47,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 16:37:47,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.39 | bwd_microstep: 5058.02 | bwd_inner_microstep: 4977.10 | bwd_allreduce_microstep: 80.85 | step_microstep: 182.11 [2024-07-31 16:37:47,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29458.45 | bwd: 41136.36 | bwd_inner: 40394.02 | bwd_allreduce: 741.85 | step: 182.69 60%|█████▉ | 735/1230 [14:25:53<9:38:25, 70.11s/it] {'loss': 1.1593, 'learning_rate': 7.358856561841021e-06, 'epoch': 0.6} 60%|█████▉ | 735/1230 [14:25:53<9:38:25, 70.11s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2319 [2024-07-31 16:37:55,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.48 | bwd_microstep: 5245.21 | bwd_inner_microstep: 4840.49 | bwd_allreduce_microstep: 404.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-07-31 16:38:04,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.18 | bwd_microstep: 5219.31 | bwd_inner_microstep: 5164.90 | bwd_allreduce_microstep: 54.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3785 [2024-07-31 16:38:13,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.08 | bwd_microstep: 5058.01 | bwd_inner_microstep: 5032.04 | bwd_allreduce_microstep: 25.90 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-31 16:38:22,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.54 | bwd_microstep: 5144.54 | bwd_inner_microstep: 5073.28 | bwd_allreduce_microstep: 71.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3827 [2024-07-31 16:38:31,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.15 | bwd_microstep: 5063.12 | bwd_inner_microstep: 5043.77 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-31 16:38:40,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.50 | bwd_microstep: 5002.17 | bwd_inner_microstep: 4982.76 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3648 [2024-07-31 16:38:48,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.00 | bwd_microstep: 5268.87 | bwd_inner_microstep: 5163.15 | bwd_allreduce_microstep: 105.66 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3682 [2024-07-31 16:38:57,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 16:38:57,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.97 | bwd_microstep: 5068.67 | bwd_inner_microstep: 4992.46 | bwd_allreduce_microstep: 76.13 | step_microstep: 181.84 [2024-07-31 16:38:57,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29245.81 | bwd: 41069.88 | bwd_inner: 40292.78 | bwd_allreduce: 776.60 | step: 182.45 60%|█████▉ | 736/1230 [14:27:03<9:38:34, 70.27s/it] {'loss': 1.1763, 'learning_rate': 7.333467265791565e-06, 'epoch': 0.6} 60%|█████▉ | 736/1230 [14:27:03<9:38:34, 70.27s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3931 [2024-07-31 16:39:06,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.76 | bwd_microstep: 5090.56 | bwd_inner_microstep: 5049.31 | bwd_allreduce_microstep: 41.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-31 16:39:15,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.86 | bwd_microstep: 5042.38 | bwd_inner_microstep: 5015.96 | bwd_allreduce_microstep: 26.35 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1212 [2024-07-31 16:39:23,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3450.41 | bwd_microstep: 5074.92 | bwd_inner_microstep: 4685.41 | bwd_allreduce_microstep: 389.45 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3819 [2024-07-31 16:39:32,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.23 | bwd_microstep: 5047.29 | bwd_inner_microstep: 5027.73 | bwd_allreduce_microstep: 19.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 16:39:41,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.12 | bwd_microstep: 5217.87 | bwd_inner_microstep: 4813.92 | bwd_allreduce_microstep: 403.89 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 1733 [2024-07-31 16:39:50,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.39 | bwd_microstep: 5163.34 | bwd_inner_microstep: 4765.55 | bwd_allreduce_microstep: 397.72 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 16:39:58,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.11 | bwd_microstep: 5078.19 | bwd_inner_microstep: 5016.81 | bwd_allreduce_microstep: 61.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-31 16:40:07,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 16:40:07,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.62 | bwd_microstep: 5129.05 | bwd_inner_microstep: 5059.28 | bwd_allreduce_microstep: 69.69 | step_microstep: 181.34 [2024-07-31 16:40:07,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28816.39 | bwd: 40843.59 | bwd_inner: 39433.91 | bwd_allreduce: 1409.19 | step: 182.04 60%|█████▉ | 737/1230 [14:28:13<9:36:43, 70.19s/it] {'loss': 1.1532, 'learning_rate': 7.308096460960443e-06, 'epoch': 0.6} 60%|█████▉ | 737/1230 [14:28:13<9:36:43, 70.19s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3150 [2024-07-31 16:40:16,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.36 | bwd_microstep: 5249.74 | bwd_inner_microstep: 4976.08 | bwd_allreduce_microstep: 273.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2323 [2024-07-31 16:40:25,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.70 | bwd_microstep: 5236.14 | bwd_inner_microstep: 4826.87 | bwd_allreduce_microstep: 409.20 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3041 [2024-07-31 16:40:34,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.72 | bwd_microstep: 5240.47 | bwd_inner_microstep: 4926.52 | bwd_allreduce_microstep: 313.88 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-31 16:40:43,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.52 | bwd_microstep: 5011.36 | bwd_inner_microstep: 4985.77 | bwd_allreduce_microstep: 25.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 16:40:51,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.37 | bwd_microstep: 4962.78 | bwd_inner_microstep: 4919.45 | bwd_allreduce_microstep: 43.26 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3631 [2024-07-31 16:41:00,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.35 | bwd_microstep: 5191.20 | bwd_inner_microstep: 5101.05 | bwd_allreduce_microstep: 90.08 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2078 [2024-07-31 16:41:09,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.44 | bwd_microstep: 5055.08 | bwd_inner_microstep: 4662.28 | bwd_allreduce_microstep: 392.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3693 [2024-07-31 16:41:18,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-31 16:41:18,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.51 | bwd_microstep: 5016.63 | bwd_inner_microstep: 4959.99 | bwd_allreduce_microstep: 56.57 | step_microstep: 181.82 [2024-07-31 16:41:18,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28909.89 | bwd: 40963.39 | bwd_inner: 39357.96 | bwd_allreduce: 1604.94 | step: 182.43 60%|██████ | 738/1230 [14:29:23<9:35:35, 70.19s/it] {'loss': 1.1718, 'learning_rate': 7.2827443232828935e-06, 'epoch': 0.6} 60%|██████ | 738/1230 [14:29:23<9:35:35, 70.19s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 16:41:27,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3853.75 | bwd_microstep: 5336.28 | bwd_inner_microstep: 5317.02 | bwd_allreduce_microstep: 19.18 | step_microstep: 0.08 dynamic ViT batch size: 21, images per sample: 10.5, dynamic token length: 3321 [2024-07-31 16:41:35,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3434.88 | bwd_microstep: 5128.86 | bwd_inner_microstep: 4987.33 | bwd_allreduce_microstep: 141.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2124 [2024-07-31 16:41:44,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.29 | bwd_microstep: 5287.35 | bwd_inner_microstep: 4876.55 | bwd_allreduce_microstep: 410.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-31 16:41:53,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.85 | bwd_microstep: 5369.17 | bwd_inner_microstep: 5287.45 | bwd_allreduce_microstep: 81.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 16:42:02,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.83 | bwd_microstep: 5152.54 | bwd_inner_microstep: 5102.24 | bwd_allreduce_microstep: 50.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 16:42:11,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.25 | bwd_microstep: 5164.26 | bwd_inner_microstep: 5086.05 | bwd_allreduce_microstep: 78.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 16:42:19,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.61 | bwd_microstep: 5060.02 | bwd_inner_microstep: 5005.88 | bwd_allreduce_microstep: 54.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 16:42:28,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 16:42:28,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.98 | bwd_microstep: 4811.74 | bwd_inner_microstep: 4774.11 | bwd_allreduce_microstep: 37.57 | step_microstep: 181.48 [2024-07-31 16:42:28,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28542.34 | bwd: 41310.19 | bwd_inner: 40436.57 | bwd_allreduce: 873.14 | step: 182.06 60%|██████ | 739/1230 [14:30:34<9:34:23, 70.19s/it] {'loss': 1.1647, 'learning_rate': 7.2574110285647244e-06, 'epoch': 0.6} 60%|██████ | 739/1230 [14:30:34<9:34:23, 70.19s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4083 [2024-07-31 16:42:37,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.15 | bwd_microstep: 5448.84 | bwd_inner_microstep: 5406.17 | bwd_allreduce_microstep: 42.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-31 16:42:46,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.76 | bwd_microstep: 5010.42 | bwd_inner_microstep: 4988.57 | bwd_allreduce_microstep: 21.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-31 16:42:54,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.85 | bwd_microstep: 5134.13 | bwd_inner_microstep: 4733.18 | bwd_allreduce_microstep: 400.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 16:43:03,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.77 | bwd_microstep: 5208.72 | bwd_inner_microstep: 4802.05 | bwd_allreduce_microstep: 406.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3753 [2024-07-31 16:43:12,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.67 | bwd_microstep: 4992.06 | bwd_inner_microstep: 4972.77 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 16:43:20,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.52 | bwd_microstep: 5037.84 | bwd_inner_microstep: 4981.00 | bwd_allreduce_microstep: 56.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 16:43:29,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.29 | bwd_microstep: 5005.20 | bwd_inner_microstep: 4950.04 | bwd_allreduce_microstep: 55.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3668 [2024-07-31 16:43:38,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 16:43:38,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.45 | bwd_microstep: 4934.90 | bwd_inner_microstep: 4904.80 | bwd_allreduce_microstep: 30.02 | step_microstep: 181.62 [2024-07-31 16:43:38,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29062.37 | bwd: 40772.07 | bwd_inner: 39738.54 | bwd_allreduce: 1033.05 | step: 182.20 60%|██████ | 740/1230 [14:31:44<9:33:09, 70.18s/it] {'loss': 1.1626, 'learning_rate': 7.232096752481061e-06, 'epoch': 0.6} 60%|██████ | 740/1230 [14:31:44<9:33:09, 70.18s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3911 [2024-07-31 16:43:47,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.12 | bwd_microstep: 5356.53 | bwd_inner_microstep: 5294.06 | bwd_allreduce_microstep: 62.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3886 [2024-07-31 16:43:56,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.94 | bwd_microstep: 5131.64 | bwd_inner_microstep: 5112.25 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3850 [2024-07-31 16:44:04,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3428.71 | bwd_microstep: 5014.30 | bwd_inner_microstep: 4986.23 | bwd_allreduce_microstep: 28.00 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2251 [2024-07-31 16:44:13,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.96 | bwd_microstep: 5155.66 | bwd_inner_microstep: 4757.06 | bwd_allreduce_microstep: 398.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-31 16:44:22,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.89 | bwd_microstep: 5152.24 | bwd_inner_microstep: 5095.88 | bwd_allreduce_microstep: 56.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2122 [2024-07-31 16:44:30,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.16 | bwd_microstep: 5130.19 | bwd_inner_microstep: 4735.09 | bwd_allreduce_microstep: 395.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-31 16:44:39,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.64 | bwd_microstep: 5139.28 | bwd_inner_microstep: 5060.43 | bwd_allreduce_microstep: 78.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3736 [2024-07-31 16:44:48,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.79 [2024-07-31 16:44:48,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.04 | bwd_microstep: 5006.95 | bwd_inner_microstep: 4966.66 | bwd_allreduce_microstep: 40.22 | step_microstep: 181.82 [2024-07-31 16:44:48,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28722.36 | bwd: 41086.77 | bwd_inner: 40007.61 | bwd_allreduce: 1078.67 | step: 182.41 60%|██████ | 741/1230 [14:32:54<9:31:53, 70.17s/it] {'loss': 1.1236, 'learning_rate': 7.206801670575145e-06, 'epoch': 0.6} 60%|██████ | 741/1230 [14:32:54<9:31:53, 70.17s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 16:44:58,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3871.19 | bwd_microstep: 5737.92 | bwd_inner_microstep: 5718.73 | bwd_allreduce_microstep: 19.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3762 [2024-07-31 16:45:07,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.59 | bwd_microstep: 5105.24 | bwd_inner_microstep: 5071.43 | bwd_allreduce_microstep: 33.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3591 [2024-07-31 16:45:15,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3234.55 | bwd_microstep: 4937.46 | bwd_inner_microstep: 4876.84 | bwd_allreduce_microstep: 60.55 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-31 16:45:23,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.31 | bwd_microstep: 5012.46 | bwd_inner_microstep: 4988.33 | bwd_allreduce_microstep: 24.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 16:45:32,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.93 | bwd_microstep: 4985.18 | bwd_inner_microstep: 4965.85 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-07-31 16:45:41,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.35 | bwd_microstep: 4967.07 | bwd_inner_microstep: 4937.26 | bwd_allreduce_microstep: 29.75 | step_microstep: 0.18 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3704 [2024-07-31 16:45:50,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.23 | bwd_microstep: 5043.34 | bwd_inner_microstep: 4971.74 | bwd_allreduce_microstep: 71.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3657 [2024-07-31 16:45:58,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 16:45:58,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.80 | bwd_microstep: 5007.65 | bwd_inner_microstep: 4932.86 | bwd_allreduce_microstep: 74.72 | step_microstep: 182.85 [2024-07-31 16:45:58,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29202.86 | bwd: 40796.31 | bwd_inner: 40462.98 | bwd_allreduce: 332.83 | step: 183.54 60%|██████ | 742/1230 [14:34:04<9:31:07, 70.22s/it] {'loss': 1.156, 'learning_rate': 7.181525958257116e-06, 'epoch': 0.6} 60%|██████ | 742/1230 [14:34:04<9:31:07, 70.22s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3763 [2024-07-31 16:46:07,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.36 | bwd_microstep: 5310.68 | bwd_inner_microstep: 5239.46 | bwd_allreduce_microstep: 71.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3779 [2024-07-31 16:46:16,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.00 | bwd_microstep: 5157.52 | bwd_inner_microstep: 5106.53 | bwd_allreduce_microstep: 50.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-31 16:46:25,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3324.09 | bwd_microstep: 5149.65 | bwd_inner_microstep: 5089.52 | bwd_allreduce_microstep: 60.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 16:46:34,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.61 | bwd_microstep: 5284.24 | bwd_inner_microstep: 5188.74 | bwd_allreduce_microstep: 95.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-31 16:46:42,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.23 | bwd_microstep: 5054.88 | bwd_inner_microstep: 4991.86 | bwd_allreduce_microstep: 62.95 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-31 16:46:51,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.94 | bwd_microstep: 5129.77 | bwd_inner_microstep: 5081.65 | bwd_allreduce_microstep: 48.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2119 [2024-07-31 16:46:59,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.23 | bwd_microstep: 5078.29 | bwd_inner_microstep: 4684.33 | bwd_allreduce_microstep: 393.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-31 16:47:08,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 16:47:08,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3191.26 | bwd_microstep: 4705.05 | bwd_inner_microstep: 4678.13 | bwd_allreduce_microstep: 26.85 | step_microstep: 182.31 [2024-07-31 16:47:08,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28029.65 | bwd: 40870.05 | bwd_inner: 40060.14 | bwd_allreduce: 809.43 | step: 182.91 60%|██████ | 743/1230 [14:35:13<9:27:32, 69.92s/it] {'loss': 1.1582, 'learning_rate': 7.156269790802801e-06, 'epoch': 0.6} 60%|██████ | 743/1230 [14:35:13<9:27:32, 69.92s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4000 [2024-07-31 16:47:17,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3826.43 | bwd_microstep: 5253.86 | bwd_inner_microstep: 5234.76 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2348 [2024-07-31 16:47:25,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.97 | bwd_microstep: 5217.40 | bwd_inner_microstep: 4811.72 | bwd_allreduce_microstep: 405.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 16:47:34,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.34 | bwd_microstep: 5081.30 | bwd_inner_microstep: 5015.54 | bwd_allreduce_microstep: 65.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 16:47:43,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.30 | bwd_microstep: 4982.04 | bwd_inner_microstep: 4962.67 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2183 [2024-07-31 16:47:52,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.13 | bwd_microstep: 5155.48 | bwd_inner_microstep: 4754.54 | bwd_allreduce_microstep: 400.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-31 16:48:00,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.19 | bwd_microstep: 4913.63 | bwd_inner_microstep: 4889.24 | bwd_allreduce_microstep: 24.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 16:48:09,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.83 | bwd_microstep: 5062.02 | bwd_inner_microstep: 5004.15 | bwd_allreduce_microstep: 57.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 16:48:18,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 16:48:18,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.52 | bwd_microstep: 5011.23 | bwd_inner_microstep: 4958.96 | bwd_allreduce_microstep: 52.20 | step_microstep: 181.83 [2024-07-31 16:48:18,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29070.62 | bwd: 40676.93 | bwd_inner: 39631.53 | bwd_allreduce: 1044.90 | step: 182.41 60%|██████ | 744/1230 [14:36:24<9:26:47, 69.97s/it] {'loss': 1.1344, 'learning_rate': 7.131033343352485e-06, 'epoch': 0.6} 60%|██████ | 744/1230 [14:36:24<9:26:47, 69.97s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3693 [2024-07-31 16:48:27,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.73 | bwd_microstep: 5494.75 | bwd_inner_microstep: 5372.36 | bwd_allreduce_microstep: 122.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3591 [2024-07-31 16:48:36,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.34 | bwd_microstep: 5121.91 | bwd_inner_microstep: 5052.92 | bwd_allreduce_microstep: 68.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3598 [2024-07-31 16:48:45,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.83 | bwd_microstep: 5323.38 | bwd_inner_microstep: 5227.68 | bwd_allreduce_microstep: 95.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 16:48:53,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.19 | bwd_microstep: 5197.85 | bwd_inner_microstep: 5113.90 | bwd_allreduce_microstep: 83.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3642 [2024-07-31 16:49:02,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.58 | bwd_microstep: 4836.67 | bwd_inner_microstep: 4817.23 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3605 [2024-07-31 16:49:11,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.57 | bwd_microstep: 5105.23 | bwd_inner_microstep: 5008.49 | bwd_allreduce_microstep: 96.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 16:49:19,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.72 | bwd_microstep: 5007.30 | bwd_inner_microstep: 4952.47 | bwd_allreduce_microstep: 54.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 16:49:28,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 16:49:28,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.52 | bwd_microstep: 4948.55 | bwd_inner_microstep: 4923.93 | bwd_allreduce_microstep: 24.55 | step_microstep: 182.74 [2024-07-31 16:49:28,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29026.38 | bwd: 41035.61 | bwd_inner: 40468.93 | bwd_allreduce: 566.20 | step: 183.33 61%|██████ | 745/1230 [14:37:34<9:26:39, 70.10s/it] {'loss': 1.2174, 'learning_rate': 7.105816790909696e-06, 'epoch': 0.61} 61%|██████ | 745/1230 [14:37:34<9:26:39, 70.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 16:49:37,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.94 | bwd_microstep: 5415.92 | bwd_inner_microstep: 5376.04 | bwd_allreduce_microstep: 39.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3987 [2024-07-31 16:49:46,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.77 | bwd_microstep: 5163.29 | bwd_inner_microstep: 5135.13 | bwd_allreduce_microstep: 28.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-31 16:49:55,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.69 | bwd_microstep: 5170.37 | bwd_inner_microstep: 5114.70 | bwd_allreduce_microstep: 55.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 16:50:04,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.19 | bwd_microstep: 4994.48 | bwd_inner_microstep: 4975.03 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-31 16:50:12,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3242.99 | bwd_microstep: 4888.18 | bwd_inner_microstep: 4856.84 | bwd_allreduce_microstep: 31.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 16:50:21,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.91 | bwd_microstep: 5181.82 | bwd_inner_microstep: 4779.32 | bwd_allreduce_microstep: 402.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 16:50:29,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.32 | bwd_microstep: 5014.06 | bwd_inner_microstep: 4954.80 | bwd_allreduce_microstep: 59.19 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 16:50:38,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 16:50:38,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3228.28 | bwd_microstep: 4882.80 | bwd_inner_microstep: 4842.25 | bwd_allreduce_microstep: 40.48 | step_microstep: 181.73 [2024-07-31 16:50:38,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28415.98 | bwd: 40710.89 | bwd_inner: 40034.05 | bwd_allreduce: 676.36 | step: 182.32 61%|██████ | 746/1230 [14:38:43<9:23:56, 69.91s/it] {'loss': 1.143, 'learning_rate': 7.080620308340022e-06, 'epoch': 0.61} 61%|██████ | 746/1230 [14:38:43<9:23:56, 69.91s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3943 [2024-07-31 16:50:47,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.32 | bwd_microstep: 5298.63 | bwd_inner_microstep: 5247.23 | bwd_allreduce_microstep: 51.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3878 [2024-07-31 16:50:55,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.91 | bwd_microstep: 5118.96 | bwd_inner_microstep: 5099.59 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 16:51:04,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.90 | bwd_microstep: 5128.15 | bwd_inner_microstep: 5053.23 | bwd_allreduce_microstep: 74.85 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3628 [2024-07-31 16:51:13,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.08 | bwd_microstep: 5127.10 | bwd_inner_microstep: 5031.49 | bwd_allreduce_microstep: 95.54 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2181 [2024-07-31 16:51:22,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.86 | bwd_microstep: 5148.22 | bwd_inner_microstep: 4747.27 | bwd_allreduce_microstep: 400.87 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2173 [2024-07-31 16:51:30,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3480.58 | bwd_microstep: 5059.37 | bwd_inner_microstep: 4667.84 | bwd_allreduce_microstep: 391.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 16:51:39,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.39 | bwd_microstep: 5057.95 | bwd_inner_microstep: 4997.67 | bwd_allreduce_microstep: 60.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 16:51:48,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 16:51:48,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.15 | bwd_microstep: 5001.29 | bwd_inner_microstep: 4982.02 | bwd_allreduce_microstep: 19.20 | step_microstep: 181.70 [2024-07-31 16:51:48,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28923.09 | bwd: 40939.65 | bwd_inner: 39826.28 | bwd_allreduce: 1112.86 | step: 182.28 61%|██████ | 747/1230 [14:39:54<9:23:27, 69.99s/it] {'loss': 1.1557, 'learning_rate': 7.055444070369852e-06, 'epoch': 0.61} 61%|██████ | 747/1230 [14:39:54<9:23:27, 69.99s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2055 [2024-07-31 16:51:57,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.94 | bwd_microstep: 5529.56 | bwd_inner_microstep: 5106.74 | bwd_allreduce_microstep: 422.75 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 16:52:06,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.72 | bwd_microstep: 5231.68 | bwd_inner_microstep: 4826.14 | bwd_allreduce_microstep: 405.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3772 [2024-07-31 16:52:14,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3273.30 | bwd_microstep: 4817.68 | bwd_inner_microstep: 4795.02 | bwd_allreduce_microstep: 22.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-31 16:52:23,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.13 | bwd_microstep: 5300.90 | bwd_inner_microstep: 5229.94 | bwd_allreduce_microstep: 70.88 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-31 16:52:31,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.01 | bwd_microstep: 5174.84 | bwd_inner_microstep: 4774.01 | bwd_allreduce_microstep: 400.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 16:52:40,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.23 | bwd_microstep: 5156.85 | bwd_inner_microstep: 5079.63 | bwd_allreduce_microstep: 77.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 16:52:49,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.94 | bwd_microstep: 5008.71 | bwd_inner_microstep: 4959.08 | bwd_allreduce_microstep: 49.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3845 [2024-07-31 16:52:58,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 16:52:58,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3804.81 | bwd_microstep: 5109.62 | bwd_inner_microstep: 5090.32 | bwd_allreduce_microstep: 19.23 | step_microstep: 182.79 [2024-07-31 16:52:58,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28505.98 | bwd: 41329.82 | bwd_inner: 39860.81 | bwd_allreduce: 1468.52 | step: 183.47 61%|██████ | 748/1230 [14:41:04<9:22:42, 70.05s/it] {'loss': 1.1574, 'learning_rate': 7.0302882515852025e-06, 'epoch': 0.61} 61%|██████ | 748/1230 [14:41:04<9:22:42, 70.05s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 16:53:08,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3873.06 | bwd_microstep: 5756.17 | bwd_inner_microstep: 5737.09 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3891 [2024-07-31 16:53:17,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.17 | bwd_microstep: 5355.81 | bwd_inner_microstep: 5263.44 | bwd_allreduce_microstep: 92.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2268 [2024-07-31 16:53:26,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.00 | bwd_microstep: 5382.24 | bwd_inner_microstep: 4964.67 | bwd_allreduce_microstep: 417.51 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 16:53:34,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.77 | bwd_microstep: 5031.75 | bwd_inner_microstep: 5006.25 | bwd_allreduce_microstep: 25.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3627 [2024-07-31 16:53:43,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.01 | bwd_microstep: 5088.00 | bwd_inner_microstep: 5006.95 | bwd_allreduce_microstep: 80.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 16:53:52,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.07 | bwd_microstep: 4964.27 | bwd_inner_microstep: 4932.25 | bwd_allreduce_microstep: 31.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 16:54:00,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.74 | bwd_microstep: 5001.44 | bwd_inner_microstep: 4942.79 | bwd_allreduce_microstep: 58.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-31 16:54:09,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 16:54:09,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.90 | bwd_microstep: 5061.85 | bwd_inner_microstep: 5003.30 | bwd_allreduce_microstep: 58.49 | step_microstep: 181.55 [2024-07-31 16:54:09,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29304.64 | bwd: 41641.52 | bwd_inner: 40856.69 | bwd_allreduce: 784.36 | step: 182.14 61%|██████ | 749/1230 [14:42:15<9:24:30, 70.42s/it] {'loss': 1.1706, 'learning_rate': 7.005153026430476e-06, 'epoch': 0.61} 61%|██████ | 749/1230 [14:42:15<9:24:30, 70.42s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2282 [2024-07-31 16:54:18,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.49 | bwd_microstep: 5584.07 | bwd_inner_microstep: 5159.61 | bwd_allreduce_microstep: 424.39 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2070 [2024-07-31 16:54:27,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.91 | bwd_microstep: 5346.01 | bwd_inner_microstep: 4934.60 | bwd_allreduce_microstep: 411.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2243 [2024-07-31 16:54:36,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.67 | bwd_microstep: 5416.14 | bwd_inner_microstep: 4996.17 | bwd_allreduce_microstep: 419.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-31 16:54:45,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.58 | bwd_microstep: 5198.72 | bwd_inner_microstep: 5141.80 | bwd_allreduce_microstep: 56.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2242 [2024-07-31 16:54:54,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.73 | bwd_microstep: 5139.64 | bwd_inner_microstep: 4741.90 | bwd_allreduce_microstep: 397.67 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3751 [2024-07-31 16:55:02,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.54 | bwd_microstep: 4930.65 | bwd_inner_microstep: 4909.21 | bwd_allreduce_microstep: 21.37 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2903 [2024-07-31 16:55:11,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.79 | bwd_microstep: 5218.82 | bwd_inner_microstep: 4811.71 | bwd_allreduce_microstep: 407.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 16:55:20,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 16:55:20,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.06 | bwd_microstep: 5060.06 | bwd_inner_microstep: 5000.53 | bwd_allreduce_microstep: 59.46 | step_microstep: 181.78 [2024-07-31 16:55:20,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28758.66 | bwd: 41894.09 | bwd_inner: 39695.47 | bwd_allreduce: 2198.13 | step: 182.36 61%|██████ | 750/1230 [14:43:26<9:24:41, 70.59s/it] {'loss': 1.1525, 'learning_rate': 6.980038569207291e-06, 'epoch': 0.61} 61%|██████ | 750/1230 [14:43:26<9:24:41, 70.59s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3953 [2024-07-31 16:55:29,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.05 | bwd_microstep: 5455.51 | bwd_inner_microstep: 5381.95 | bwd_allreduce_microstep: 73.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3825 [2024-07-31 16:55:38,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.43 | bwd_microstep: 5038.66 | bwd_inner_microstep: 5019.29 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3778 [2024-07-31 16:55:47,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.69 | bwd_microstep: 5221.26 | bwd_inner_microstep: 5166.04 | bwd_allreduce_microstep: 55.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-31 16:55:55,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3222.77 | bwd_microstep: 4807.56 | bwd_inner_microstep: 4787.33 | bwd_allreduce_microstep: 20.16 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2158 [2024-07-31 16:56:04,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.12 | bwd_microstep: 5185.25 | bwd_inner_microstep: 4781.10 | bwd_allreduce_microstep: 404.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 16:56:13,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.30 | bwd_microstep: 4996.77 | bwd_inner_microstep: 4976.51 | bwd_allreduce_microstep: 20.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 16:56:21,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.44 | bwd_microstep: 5023.09 | bwd_inner_microstep: 4970.50 | bwd_allreduce_microstep: 52.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 16:56:30,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 16:56:30,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.80 | bwd_microstep: 4898.30 | bwd_inner_microstep: 4878.91 | bwd_allreduce_microstep: 19.32 | step_microstep: 181.76 [2024-07-31 16:56:30,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28874.50 | bwd: 40626.38 | bwd_inner: 39961.58 | bwd_allreduce: 664.31 | step: 182.35 61%|██████ | 751/1230 [14:44:36<9:21:43, 70.36s/it] {'loss': 1.1916, 'learning_rate': 6.954945054073229e-06, 'epoch': 0.61} 61%|██████ | 751/1230 [14:44:36<9:21:43, 70.36s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 16:56:39,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.23 | bwd_microstep: 5363.63 | bwd_inner_microstep: 5266.51 | bwd_allreduce_microstep: 97.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3853 [2024-07-31 16:56:48,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.95 | bwd_microstep: 5257.77 | bwd_inner_microstep: 5199.82 | bwd_allreduce_microstep: 57.89 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2375 [2024-07-31 16:56:57,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.05 | bwd_microstep: 5378.46 | bwd_inner_microstep: 4960.72 | bwd_allreduce_microstep: 417.67 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 16:57:06,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.44 | bwd_microstep: 5057.45 | bwd_inner_microstep: 5029.33 | bwd_allreduce_microstep: 28.06 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3624 [2024-07-31 16:57:15,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.66 | bwd_microstep: 5170.81 | bwd_inner_microstep: 5070.24 | bwd_allreduce_microstep: 100.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-31 16:57:23,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.45 | bwd_microstep: 4928.76 | bwd_inner_microstep: 4547.93 | bwd_allreduce_microstep: 380.76 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-31 16:57:31,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.56 | bwd_microstep: 4944.24 | bwd_inner_microstep: 4914.85 | bwd_allreduce_microstep: 29.32 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2141 [2024-07-31 16:57:40,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 16:57:40,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.32 | bwd_microstep: 5038.25 | bwd_inner_microstep: 4648.22 | bwd_allreduce_microstep: 389.96 | step_microstep: 181.54 [2024-07-31 16:57:40,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28388.58 | bwd: 41139.34 | bwd_inner: 39637.56 | bwd_allreduce: 1501.30 | step: 182.13 61%|██████ | 752/1230 [14:45:46<9:19:20, 70.21s/it] {'loss': 1.1099, 'learning_rate': 6.9298726550406524e-06, 'epoch': 0.61} 61%|██████ | 752/1230 [14:45:46<9:19:20, 70.21s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2374 [2024-07-31 16:57:49,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.24 | bwd_microstep: 5689.62 | bwd_inner_microstep: 5254.10 | bwd_allreduce_microstep: 435.45 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2240 [2024-07-31 16:57:58,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.74 | bwd_microstep: 5251.01 | bwd_inner_microstep: 4841.03 | bwd_allreduce_microstep: 409.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-31 16:58:07,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.12 | bwd_microstep: 5182.19 | bwd_inner_microstep: 5095.63 | bwd_allreduce_microstep: 86.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-31 16:58:16,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.65 | bwd_microstep: 5240.34 | bwd_inner_microstep: 4833.39 | bwd_allreduce_microstep: 406.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 16:58:25,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.96 | bwd_microstep: 5028.33 | bwd_inner_microstep: 5002.41 | bwd_allreduce_microstep: 25.86 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 16:58:33,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.18 | bwd_microstep: 4977.64 | bwd_inner_microstep: 4940.35 | bwd_allreduce_microstep: 37.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 16:58:41,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3218.46 | bwd_microstep: 4692.80 | bwd_inner_microstep: 4672.53 | bwd_allreduce_microstep: 20.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3668 [2024-07-31 16:58:50,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 16:58:50,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.37 | bwd_microstep: 4884.22 | bwd_inner_microstep: 4864.54 | bwd_allreduce_microstep: 19.60 | step_microstep: 181.57 [2024-07-31 16:58:50,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28717.65 | bwd: 40946.14 | bwd_inner: 39503.93 | bwd_allreduce: 1441.72 | step: 182.15 61%|██████ | 753/1230 [14:46:56<9:17:39, 70.15s/it] {'loss': 1.1596, 'learning_rate': 6.904821545975504e-06, 'epoch': 0.61} 61%|██████ | 753/1230 [14:46:56<9:17:39, 70.15s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2493 [2024-07-31 16:58:59,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.28 | bwd_microstep: 5698.13 | bwd_inner_microstep: 5261.27 | bwd_allreduce_microstep: 436.80 | step_microstep: 0.16 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3940 [2024-07-31 16:59:08,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.54 | bwd_microstep: 5256.71 | bwd_inner_microstep: 5208.30 | bwd_allreduce_microstep: 48.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3801 [2024-07-31 16:59:17,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.63 | bwd_microstep: 5156.03 | bwd_inner_microstep: 5085.05 | bwd_allreduce_microstep: 70.91 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2200 [2024-07-31 16:59:26,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.38 | bwd_microstep: 5181.07 | bwd_inner_microstep: 4777.30 | bwd_allreduce_microstep: 403.70 | step_microstep: 0.18 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2046 [2024-07-31 16:59:34,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.17 | bwd_microstep: 5125.63 | bwd_inner_microstep: 4728.41 | bwd_allreduce_microstep: 397.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-31 16:59:43,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.54 | bwd_microstep: 5229.28 | bwd_inner_microstep: 4824.00 | bwd_allreduce_microstep: 405.21 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2101 [2024-07-31 16:59:52,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.85 | bwd_microstep: 5115.00 | bwd_inner_microstep: 4719.19 | bwd_allreduce_microstep: 395.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3666 [2024-07-31 17:00:01,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 17:00:01,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.04 | bwd_microstep: 4987.71 | bwd_inner_microstep: 4921.55 | bwd_allreduce_microstep: 66.09 | step_microstep: 182.76 [2024-07-31 17:00:01,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28652.34 | bwd: 41749.54 | bwd_inner: 39525.02 | bwd_allreduce: 2224.04 | step: 183.51 61%|██████▏ | 754/1230 [14:48:06<9:17:52, 70.32s/it] {'loss': 1.1477, 'learning_rate': 6.879791900596077e-06, 'epoch': 0.61} 61%|██████▏ | 754/1230 [14:48:06<9:17:52, 70.32s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 17:00:10,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.94 | bwd_microstep: 5412.34 | bwd_inner_microstep: 5307.12 | bwd_allreduce_microstep: 105.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3573 [2024-07-31 17:00:18,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.87 | bwd_microstep: 5106.98 | bwd_inner_microstep: 5030.24 | bwd_allreduce_microstep: 76.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-31 17:00:27,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.66 | bwd_microstep: 5061.07 | bwd_inner_microstep: 4994.16 | bwd_allreduce_microstep: 66.84 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-31 17:00:36,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.02 | bwd_microstep: 5035.06 | bwd_inner_microstep: 4995.59 | bwd_allreduce_microstep: 39.40 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-31 17:00:44,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.39 | bwd_microstep: 5045.30 | bwd_inner_microstep: 4989.85 | bwd_allreduce_microstep: 55.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3805 [2024-07-31 17:00:53,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.15 | bwd_microstep: 5047.42 | bwd_inner_microstep: 5024.23 | bwd_allreduce_microstep: 23.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 17:01:01,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3219.89 | bwd_microstep: 4702.38 | bwd_inner_microstep: 4680.42 | bwd_allreduce_microstep: 21.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 17:01:10,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 17:01:10,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3458.58 | bwd_microstep: 5032.25 | bwd_inner_microstep: 4639.14 | bwd_allreduce_microstep: 393.04 | step_microstep: 181.79 [2024-07-31 17:01:10,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28378.40 | bwd: 40442.78 | bwd_inner: 39660.70 | bwd_allreduce: 781.60 | step: 182.39 61%|██████▏ | 755/1230 [14:49:16<9:13:56, 69.97s/it] {'loss': 1.2362, 'learning_rate': 6.854783892471827e-06, 'epoch': 0.61} 61%|██████▏ | 755/1230 [14:49:16<9:13:56, 69.97s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 17:01:19,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.87 | bwd_microstep: 5166.68 | bwd_inner_microstep: 5147.62 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4044 [2024-07-31 17:01:28,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.06 | bwd_microstep: 5227.35 | bwd_inner_microstep: 5200.58 | bwd_allreduce_microstep: 26.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 17:01:36,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.89 | bwd_microstep: 5067.52 | bwd_inner_microstep: 5038.20 | bwd_allreduce_microstep: 29.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-31 17:01:45,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.53 | bwd_microstep: 5220.88 | bwd_inner_microstep: 4814.72 | bwd_allreduce_microstep: 406.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 17:01:54,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.34 | bwd_microstep: 4981.82 | bwd_inner_microstep: 4962.46 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 17:02:03,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.73 | bwd_microstep: 5074.67 | bwd_inner_microstep: 4680.85 | bwd_allreduce_microstep: 393.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 17:02:11,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.85 | bwd_microstep: 5108.27 | bwd_inner_microstep: 5032.74 | bwd_allreduce_microstep: 75.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 17:02:20,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 17:02:20,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.20 | bwd_microstep: 5017.49 | bwd_inner_microstep: 4965.32 | bwd_allreduce_microstep: 52.10 | step_microstep: 182.75 [2024-07-31 17:02:20,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29303.38 | bwd: 40864.68 | bwd_inner: 39842.42 | bwd_allreduce: 1021.76 | step: 183.33 61%|██████▏ | 756/1230 [14:50:26<9:14:01, 70.13s/it] {'loss': 1.1253, 'learning_rate': 6.829797695022163e-06, 'epoch': 0.61} 61%|██████▏ | 756/1230 [14:50:26<9:14:01, 70.13s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3907 [2024-07-31 17:02:30,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3843.68 | bwd_microstep: 5417.01 | bwd_inner_microstep: 5358.33 | bwd_allreduce_microstep: 58.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2359 [2024-07-31 17:02:38,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.39 | bwd_microstep: 5191.11 | bwd_inner_microstep: 4785.53 | bwd_allreduce_microstep: 405.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-31 17:02:47,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.11 | bwd_microstep: 5160.99 | bwd_inner_microstep: 5124.30 | bwd_allreduce_microstep: 36.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-31 17:02:56,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.16 | bwd_microstep: 5002.88 | bwd_inner_microstep: 4983.51 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 17:03:05,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.90 | bwd_microstep: 5206.23 | bwd_inner_microstep: 4800.69 | bwd_allreduce_microstep: 405.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 17:03:13,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3480.96 | bwd_microstep: 5067.07 | bwd_inner_microstep: 4673.85 | bwd_allreduce_microstep: 393.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 17:03:22,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.83 | bwd_microstep: 5101.37 | bwd_inner_microstep: 4704.02 | bwd_allreduce_microstep: 397.28 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2101 [2024-07-31 17:03:31,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 17:03:31,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.15 | bwd_microstep: 5231.38 | bwd_inner_microstep: 4824.65 | bwd_allreduce_microstep: 406.65 | step_microstep: 181.75 [2024-07-31 17:03:31,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28997.07 | bwd: 41378.03 | bwd_inner: 39254.82 | bwd_allreduce: 2122.72 | step: 182.33 62%|██████▏ | 757/1230 [14:51:37<9:14:13, 70.30s/it] {'loss': 1.1082, 'learning_rate': 6.804833481515256e-06, 'epoch': 0.62} 62%|██████▏ | 757/1230 [14:51:37<9:14:13, 70.30s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3921 [2024-07-31 17:03:40,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.63 | bwd_microstep: 5439.60 | bwd_inner_microstep: 5372.71 | bwd_allreduce_microstep: 66.82 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2041 [2024-07-31 17:03:49,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.85 | bwd_microstep: 5299.10 | bwd_inner_microstep: 4888.14 | bwd_allreduce_microstep: 410.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 17:03:58,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.18 | bwd_microstep: 4980.97 | bwd_inner_microstep: 4961.64 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2888 [2024-07-31 17:04:06,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.44 | bwd_microstep: 5037.97 | bwd_inner_microstep: 4663.11 | bwd_allreduce_microstep: 374.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 17:04:15,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3915.93 | bwd_microstep: 5113.20 | bwd_inner_microstep: 5087.65 | bwd_allreduce_microstep: 25.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 17:04:24,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.34 | bwd_microstep: 5238.40 | bwd_inner_microstep: 4833.24 | bwd_allreduce_microstep: 405.09 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3677 [2024-07-31 17:04:33,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.23 | bwd_microstep: 4838.65 | bwd_inner_microstep: 4812.23 | bwd_allreduce_microstep: 26.34 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3669 [2024-07-31 17:04:41,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 17:04:41,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.53 | bwd_microstep: 5049.64 | bwd_inner_microstep: 4972.36 | bwd_allreduce_microstep: 77.21 | step_microstep: 181.96 [2024-07-31 17:04:41,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29108.03 | bwd: 40997.51 | bwd_inner: 39591.02 | bwd_allreduce: 1406.01 | step: 182.53 62%|██████▏ | 758/1230 [14:52:47<9:13:21, 70.34s/it] {'loss': 1.1894, 'learning_rate': 6.779891425066818e-06, 'epoch': 0.62} 62%|██████▏ | 758/1230 [14:52:47<9:13:21, 70.34s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2280 [2024-07-31 17:04:50,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.92 | bwd_microstep: 5265.38 | bwd_inner_microstep: 4859.56 | bwd_allreduce_microstep: 405.75 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2313 [2024-07-31 17:04:59,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.60 | bwd_microstep: 5281.34 | bwd_inner_microstep: 4871.49 | bwd_allreduce_microstep: 409.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3847 [2024-07-31 17:05:08,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.39 | bwd_microstep: 5106.20 | bwd_inner_microstep: 5086.87 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-31 17:05:17,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.05 | bwd_microstep: 5004.82 | bwd_inner_microstep: 4982.86 | bwd_allreduce_microstep: 21.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3794 [2024-07-31 17:05:26,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.09 | bwd_microstep: 5036.14 | bwd_inner_microstep: 5016.87 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3753 [2024-07-31 17:05:34,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.29 | bwd_microstep: 5064.62 | bwd_inner_microstep: 5037.31 | bwd_allreduce_microstep: 27.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-31 17:05:43,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.15 | bwd_microstep: 5151.50 | bwd_inner_microstep: 5081.87 | bwd_allreduce_microstep: 69.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-31 17:05:52,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 17:05:52,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.00 | bwd_microstep: 4926.78 | bwd_inner_microstep: 4903.69 | bwd_allreduce_microstep: 23.02 | step_microstep: 181.57 [2024-07-31 17:05:52,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29488.41 | bwd: 40836.77 | bwd_inner: 39840.47 | bwd_allreduce: 995.80 | step: 182.14 62%|██████▏ | 759/1230 [14:53:58<9:12:56, 70.44s/it] {'loss': 1.1347, 'learning_rate': 6.754971698638917e-06, 'epoch': 0.62} 62%|██████▏ | 759/1230 [14:53:58<9:12:56, 70.44s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4057 [2024-07-31 17:06:01,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3879.49 | bwd_microstep: 5388.20 | bwd_inner_microstep: 5362.64 | bwd_allreduce_microstep: 25.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2311 [2024-07-31 17:06:10,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.34 | bwd_microstep: 5046.43 | bwd_inner_microstep: 4651.18 | bwd_allreduce_microstep: 395.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3883 [2024-07-31 17:06:19,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.05 | bwd_microstep: 5124.65 | bwd_inner_microstep: 5105.15 | bwd_allreduce_microstep: 19.43 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2229 [2024-07-31 17:06:27,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.69 | bwd_microstep: 5264.82 | bwd_inner_microstep: 4856.30 | bwd_allreduce_microstep: 408.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-31 17:06:36,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.24 | bwd_microstep: 4983.05 | bwd_inner_microstep: 4963.70 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-31 17:06:45,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.56 | bwd_microstep: 4986.31 | bwd_inner_microstep: 4966.93 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 17:06:54,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.54 | bwd_microstep: 4985.60 | bwd_inner_microstep: 4966.26 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 17:07:03,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 17:07:03,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.63 | bwd_microstep: 5248.35 | bwd_inner_microstep: 5158.48 | bwd_allreduce_microstep: 89.80 | step_microstep: 182.67 [2024-07-31 17:07:03,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29282.45 | bwd: 41027.40 | bwd_inner: 40030.61 | bwd_allreduce: 996.30 | step: 183.26 62%|██████▏ | 760/1230 [14:55:09<9:12:15, 70.50s/it] {'loss': 1.1739, 'learning_rate': 6.730074475038764e-06, 'epoch': 0.62} 62%|██████▏ | 760/1230 [14:55:09<9:12:15, 70.50s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3970 [2024-07-31 17:07:12,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3839.22 | bwd_microstep: 5258.52 | bwd_inner_microstep: 5237.39 | bwd_allreduce_microstep: 21.07 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-31 17:07:20,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.76 | bwd_microstep: 5068.02 | bwd_inner_microstep: 5026.74 | bwd_allreduce_microstep: 41.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2225 [2024-07-31 17:07:29,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.56 | bwd_microstep: 5254.32 | bwd_inner_microstep: 4847.47 | bwd_allreduce_microstep: 406.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-31 17:07:38,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.37 | bwd_microstep: 4998.70 | bwd_inner_microstep: 4979.21 | bwd_allreduce_microstep: 19.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 17:07:47,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.01 | bwd_microstep: 4982.51 | bwd_inner_microstep: 4963.21 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3687 [2024-07-31 17:07:55,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.74 | bwd_microstep: 4977.03 | bwd_inner_microstep: 4922.28 | bwd_allreduce_microstep: 54.68 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 17:08:04,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.38 | bwd_microstep: 5216.76 | bwd_inner_microstep: 4811.24 | bwd_allreduce_microstep: 405.45 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3646 [2024-07-31 17:08:13,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 17:08:13,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.15 | bwd_microstep: 5014.29 | bwd_inner_microstep: 4952.55 | bwd_allreduce_microstep: 61.67 | step_microstep: 182.59 [2024-07-31 17:08:13,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29109.09 | bwd: 40770.13 | bwd_inner: 39740.02 | bwd_allreduce: 1029.61 | step: 183.20 62%|██████▏ | 761/1230 [14:56:19<9:10:23, 70.41s/it] {'loss': 1.1456, 'learning_rate': 6.7051999269175405e-06, 'epoch': 0.62} 62%|██████▏ | 761/1230 [14:56:19<9:10:23, 70.41s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2448 [2024-07-31 17:08:22,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.98 | bwd_microstep: 5308.29 | bwd_inner_microstep: 4900.39 | bwd_allreduce_microstep: 407.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3903 [2024-07-31 17:08:31,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3806.25 | bwd_microstep: 5113.07 | bwd_inner_microstep: 5093.76 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 17:08:39,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.47 | bwd_microstep: 5184.67 | bwd_inner_microstep: 4783.76 | bwd_allreduce_microstep: 400.83 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2742 [2024-07-31 17:08:48,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.71 | bwd_microstep: 5184.02 | bwd_inner_microstep: 4778.05 | bwd_allreduce_microstep: 405.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3666 [2024-07-31 17:08:57,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.35 | bwd_microstep: 4902.05 | bwd_inner_microstep: 4877.95 | bwd_allreduce_microstep: 24.04 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 17:09:06,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.72 | bwd_microstep: 4986.46 | bwd_inner_microstep: 4967.07 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-31 17:09:14,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.42 | bwd_microstep: 5052.84 | bwd_inner_microstep: 4988.71 | bwd_allreduce_microstep: 64.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 17:09:23,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 17:09:23,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.08 | bwd_microstep: 5033.53 | bwd_inner_microstep: 4970.33 | bwd_allreduce_microstep: 63.13 | step_microstep: 181.52 [2024-07-31 17:09:23,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29024.89 | bwd: 40764.92 | bwd_inner: 39359.96 | bwd_allreduce: 1404.46 | step: 182.11 62%|██████▏ | 762/1230 [14:57:29<9:08:32, 70.32s/it] {'loss': 1.1876, 'learning_rate': 6.680348226769164e-06, 'epoch': 0.62} 62%|██████▏ | 762/1230 [14:57:29<9:08:32, 70.32s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3976 [2024-07-31 17:09:32,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.60 | bwd_microstep: 5361.67 | bwd_inner_microstep: 5318.17 | bwd_allreduce_microstep: 43.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-31 17:09:41,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.62 | bwd_microstep: 5245.48 | bwd_inner_microstep: 4839.39 | bwd_allreduce_microstep: 406.01 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2070 [2024-07-31 17:09:49,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2995.84 | bwd_microstep: 4838.64 | bwd_inner_microstep: 4466.27 | bwd_allreduce_microstep: 372.30 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2889 [2024-07-31 17:09:57,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.22 | bwd_microstep: 5140.99 | bwd_inner_microstep: 4740.22 | bwd_allreduce_microstep: 400.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-31 17:10:06,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.27 | bwd_microstep: 4827.29 | bwd_inner_microstep: 4802.82 | bwd_allreduce_microstep: 24.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-31 17:10:13,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3015.00 | bwd_microstep: 4885.56 | bwd_inner_microstep: 4510.02 | bwd_allreduce_microstep: 375.47 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-31 17:10:22,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.26 | bwd_microstep: 5109.63 | bwd_inner_microstep: 4715.73 | bwd_allreduce_microstep: 393.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 17:10:31,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 17:10:31,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.85 | bwd_microstep: 5014.52 | bwd_inner_microstep: 4956.12 | bwd_allreduce_microstep: 58.33 | step_microstep: 182.39 [2024-07-31 17:10:31,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27164.57 | bwd: 40423.76 | bwd_inner: 38348.69 | bwd_allreduce: 2074.58 | step: 182.99 62%|██████▏ | 763/1230 [14:58:37<9:01:44, 69.60s/it] {'loss': 1.1199, 'learning_rate': 6.655519546929121e-06, 'epoch': 0.62} 62%|██████▏ | 763/1230 [14:58:37<9:01:44, 69.60s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2240 [2024-07-31 17:10:40,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.40 | bwd_microstep: 5471.60 | bwd_inner_microstep: 5049.68 | bwd_allreduce_microstep: 421.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3865 [2024-07-31 17:10:49,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.12 | bwd_microstep: 5516.60 | bwd_inner_microstep: 5422.47 | bwd_allreduce_microstep: 94.06 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-07-31 17:10:58,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.98 | bwd_microstep: 5094.63 | bwd_inner_microstep: 5065.27 | bwd_allreduce_microstep: 29.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 17:11:07,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.35 | bwd_microstep: 5142.76 | bwd_inner_microstep: 5072.00 | bwd_allreduce_microstep: 70.70 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2180 [2024-07-31 17:11:15,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2997.50 | bwd_microstep: 4857.01 | bwd_inner_microstep: 4484.35 | bwd_allreduce_microstep: 372.59 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-31 17:11:24,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.91 | bwd_microstep: 4997.35 | bwd_inner_microstep: 4966.55 | bwd_allreduce_microstep: 30.74 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 17:11:32,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.62 | bwd_microstep: 5051.90 | bwd_inner_microstep: 4993.12 | bwd_allreduce_microstep: 58.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 17:11:41,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 17:11:41,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.39 | bwd_microstep: 4987.71 | bwd_inner_microstep: 4939.43 | bwd_allreduce_microstep: 48.21 | step_microstep: 182.86 [2024-07-31 17:11:41,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28592.14 | bwd: 41119.54 | bwd_inner: 39992.80 | bwd_allreduce: 1126.25 | step: 183.57 62%|██████▏ | 764/1230 [14:59:47<9:01:36, 69.73s/it] {'loss': 1.1836, 'learning_rate': 6.630714059573267e-06, 'epoch': 0.62} 62%|██████▏ | 764/1230 [14:59:47<9:01:36, 69.73s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4088 [2024-07-31 17:11:50,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3865.49 | bwd_microstep: 5358.82 | bwd_inner_microstep: 5339.63 | bwd_allreduce_microstep: 19.12 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2810 [2024-07-31 17:11:58,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3059.51 | bwd_microstep: 5005.87 | bwd_inner_microstep: 4619.88 | bwd_allreduce_microstep: 385.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3766 [2024-07-31 17:12:07,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.19 | bwd_microstep: 5011.33 | bwd_inner_microstep: 4991.89 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3764 [2024-07-31 17:12:16,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.54 | bwd_microstep: 5105.16 | bwd_inner_microstep: 5072.12 | bwd_allreduce_microstep: 32.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 17:12:25,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.81 | bwd_microstep: 4970.03 | bwd_inner_microstep: 4941.30 | bwd_allreduce_microstep: 28.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 17:12:33,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.14 | bwd_microstep: 5056.35 | bwd_inner_microstep: 5000.03 | bwd_allreduce_microstep: 56.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-31 17:12:42,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.26 | bwd_microstep: 4923.45 | bwd_inner_microstep: 4897.53 | bwd_allreduce_microstep: 25.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 17:12:51,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 17:12:51,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.96 | bwd_microstep: 5044.79 | bwd_inner_microstep: 4986.46 | bwd_allreduce_microstep: 58.26 | step_microstep: 181.60 [2024-07-31 17:12:51,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28889.79 | bwd: 40475.77 | bwd_inner: 39848.78 | bwd_allreduce: 626.51 | step: 182.18 62%|██████▏ | 765/1230 [15:00:57<9:00:21, 69.72s/it] {'loss': 1.2137, 'learning_rate': 6.6059319367166165e-06, 'epoch': 0.62} 62%|██████▏ | 765/1230 [15:00:57<9:00:21, 69.72s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3550 [2024-07-31 17:12:59,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.96 | bwd_microstep: 5136.37 | bwd_inner_microstep: 5059.76 | bwd_allreduce_microstep: 76.54 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3581 [2024-07-31 17:13:08,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.22 | bwd_microstep: 5387.34 | bwd_inner_microstep: 5233.97 | bwd_allreduce_microstep: 153.30 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3593 [2024-07-31 17:13:17,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.74 | bwd_microstep: 5213.16 | bwd_inner_microstep: 5146.17 | bwd_allreduce_microstep: 66.92 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3097 [2024-07-31 17:13:26,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3410.30 | bwd_microstep: 5096.62 | bwd_inner_microstep: 4848.86 | bwd_allreduce_microstep: 247.68 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2074 [2024-07-31 17:13:35,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.04 | bwd_microstep: 5163.03 | bwd_inner_microstep: 4760.73 | bwd_allreduce_microstep: 402.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-31 17:13:43,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3083.39 | bwd_microstep: 5106.94 | bwd_inner_microstep: 4714.37 | bwd_allreduce_microstep: 392.50 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 17:13:52,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3900.25 | bwd_microstep: 5086.42 | bwd_inner_microstep: 5067.11 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 17:14:01,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-31 17:14:01,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.32 | bwd_microstep: 5173.45 | bwd_inner_microstep: 5098.83 | bwd_allreduce_microstep: 74.55 | step_microstep: 182.04 [2024-07-31 17:14:01,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28386.11 | bwd: 41363.31 | bwd_inner: 39929.74 | bwd_allreduce: 1433.07 | step: 182.74 62%|██████▏ | 766/1230 [15:02:07<9:00:01, 69.83s/it] {'loss': 1.1481, 'learning_rate': 6.5811733502121715e-06, 'epoch': 0.62} 62%|██████▏ | 766/1230 [15:02:07<9:00:01, 69.83s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3798 [2024-07-31 17:14:10,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3853.45 | bwd_microstep: 5506.97 | bwd_inner_microstep: 5430.84 | bwd_allreduce_microstep: 76.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 17:14:19,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3800.72 | bwd_microstep: 5346.62 | bwd_inner_microstep: 5263.89 | bwd_allreduce_microstep: 82.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3764 [2024-07-31 17:14:28,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.56 | bwd_microstep: 5121.21 | bwd_inner_microstep: 5078.32 | bwd_allreduce_microstep: 42.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2198 [2024-07-31 17:14:37,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.12 | bwd_microstep: 5261.15 | bwd_inner_microstep: 4850.24 | bwd_allreduce_microstep: 410.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-31 17:14:45,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.98 | bwd_microstep: 4974.50 | bwd_inner_microstep: 4939.32 | bwd_allreduce_microstep: 35.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 17:14:54,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3038.52 | bwd_microstep: 5017.47 | bwd_inner_microstep: 4631.69 | bwd_allreduce_microstep: 385.72 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3667 [2024-07-31 17:15:02,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3420.50 | bwd_microstep: 4796.62 | bwd_inner_microstep: 4777.30 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-31 17:15:11,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 17:15:11,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.15 | bwd_microstep: 5180.80 | bwd_inner_microstep: 5161.29 | bwd_allreduce_microstep: 19.43 | step_microstep: 182.77 [2024-07-31 17:15:11,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28572.91 | bwd: 41205.32 | bwd_inner: 40132.81 | bwd_allreduce: 1072.00 | step: 183.34 62%|██████▏ | 767/1230 [15:03:17<8:59:30, 69.91s/it] {'loss': 1.1523, 'learning_rate': 6.556438471749708e-06, 'epoch': 0.62} 62%|██████▏ | 767/1230 [15:03:17<8:59:30, 69.91s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 17:15:20,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3875.31 | bwd_microstep: 5416.57 | bwd_inner_microstep: 5397.55 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3945 [2024-07-31 17:15:29,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3807.82 | bwd_microstep: 5190.64 | bwd_inner_microstep: 5171.33 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3808 [2024-07-31 17:15:38,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.05 | bwd_microstep: 5176.22 | bwd_inner_microstep: 5139.88 | bwd_allreduce_microstep: 36.28 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3882 [2024-07-31 17:15:47,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3807.57 | bwd_microstep: 5134.28 | bwd_inner_microstep: 5114.97 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 17:15:56,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.19 | bwd_microstep: 5062.16 | bwd_inner_microstep: 5000.01 | bwd_allreduce_microstep: 62.08 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 17:16:05,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.71 | bwd_microstep: 5126.91 | bwd_inner_microstep: 5060.34 | bwd_allreduce_microstep: 66.51 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 17:16:13,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.28 | bwd_microstep: 4993.65 | bwd_inner_microstep: 4956.74 | bwd_allreduce_microstep: 36.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 17:16:22,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 17:16:22,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.54 | bwd_microstep: 4968.00 | bwd_inner_microstep: 4918.67 | bwd_allreduce_microstep: 49.26 | step_microstep: 182.81 [2024-07-31 17:16:22,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29711.39 | bwd: 41068.42 | bwd_inner: 40759.42 | bwd_allreduce: 308.50 | step: 183.52 62%|██████▏ | 768/1230 [15:04:28<9:01:06, 70.27s/it] {'loss': 1.1573, 'learning_rate': 6.531727472854614e-06, 'epoch': 0.62} 62%|██████▏ | 768/1230 [15:04:28<9:01:06, 70.27s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2309 [2024-07-31 17:16:31,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.14 | bwd_microstep: 5246.13 | bwd_inner_microstep: 4843.23 | bwd_allreduce_microstep: 402.83 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3805 [2024-07-31 17:16:40,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.01 | bwd_microstep: 5141.89 | bwd_inner_microstep: 5094.31 | bwd_allreduce_microstep: 47.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-31 17:16:48,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.58 | bwd_microstep: 5001.75 | bwd_inner_microstep: 4982.39 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3742 [2024-07-31 17:16:57,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.68 | bwd_microstep: 5193.22 | bwd_inner_microstep: 5115.40 | bwd_allreduce_microstep: 77.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-31 17:17:06,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.95 | bwd_microstep: 5039.31 | bwd_inner_microstep: 5012.69 | bwd_allreduce_microstep: 26.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 17:17:15,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.46 | bwd_microstep: 4981.30 | bwd_inner_microstep: 4961.89 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 17:17:23,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.29 | bwd_microstep: 5116.60 | bwd_inner_microstep: 5047.73 | bwd_allreduce_microstep: 68.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 17:17:32,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.80 [2024-07-31 17:17:32,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.30 | bwd_microstep: 4792.63 | bwd_inner_microstep: 4759.55 | bwd_allreduce_microstep: 33.01 | step_microstep: 182.41 [2024-07-31 17:17:32,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28774.32 | bwd: 40512.80 | bwd_inner: 39817.13 | bwd_allreduce: 695.18 | step: 183.01 63%|██████▎ | 769/1230 [15:05:37<8:58:25, 70.08s/it] {'loss': 1.1651, 'learning_rate': 6.507040524886674e-06, 'epoch': 0.63} 63%|██████▎ | 769/1230 [15:05:37<8:58:25, 70.08s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2342 [2024-07-31 17:17:41,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.44 | bwd_microstep: 5410.94 | bwd_inner_microstep: 4999.24 | bwd_allreduce_microstep: 411.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3551 [2024-07-31 17:17:49,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.19 | bwd_microstep: 5177.84 | bwd_inner_microstep: 5091.79 | bwd_allreduce_microstep: 85.98 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3578 [2024-07-31 17:17:58,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.98 | bwd_microstep: 5074.95 | bwd_inner_microstep: 5002.86 | bwd_allreduce_microstep: 72.02 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3735 [2024-07-31 17:18:07,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.05 | bwd_microstep: 5136.66 | bwd_inner_microstep: 5078.18 | bwd_allreduce_microstep: 58.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2214 [2024-07-31 17:18:15,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3069.42 | bwd_microstep: 5068.33 | bwd_inner_microstep: 4677.88 | bwd_allreduce_microstep: 390.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 17:18:23,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3209.97 | bwd_microstep: 4757.35 | bwd_inner_microstep: 4730.74 | bwd_allreduce_microstep: 26.53 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-31 17:18:32,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.40 | bwd_microstep: 5129.67 | bwd_inner_microstep: 4733.69 | bwd_allreduce_microstep: 395.91 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 17:18:40,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 17:18:40,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.60 | bwd_microstep: 5068.64 | bwd_inner_microstep: 5008.66 | bwd_allreduce_microstep: 59.91 | step_microstep: 182.94 [2024-07-31 17:18:40,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27715.98 | bwd: 40824.36 | bwd_inner: 39322.97 | bwd_allreduce: 1500.87 | step: 183.56 63%|██████▎ | 770/1230 [15:06:46<8:54:29, 69.72s/it] {'loss': 1.1269, 'learning_rate': 6.4823777990388835e-06, 'epoch': 0.63} 63%|██████▎ | 770/1230 [15:06:46<8:54:29, 69.72s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3906 [2024-07-31 17:18:50,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.20 | bwd_microstep: 5492.52 | bwd_inner_microstep: 5413.41 | bwd_allreduce_microstep: 79.04 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2311 [2024-07-31 17:18:59,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.48 | bwd_microstep: 5423.57 | bwd_inner_microstep: 5005.02 | bwd_allreduce_microstep: 418.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 17:19:08,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.67 | bwd_microstep: 5220.86 | bwd_inner_microstep: 5140.40 | bwd_allreduce_microstep: 80.39 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 17:19:16,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3238.11 | bwd_microstep: 4889.93 | bwd_inner_microstep: 4834.08 | bwd_allreduce_microstep: 55.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 17:19:24,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.55 | bwd_microstep: 4970.99 | bwd_inner_microstep: 4936.58 | bwd_allreduce_microstep: 34.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 17:19:32,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2997.28 | bwd_microstep: 4876.40 | bwd_inner_microstep: 4500.76 | bwd_allreduce_microstep: 375.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-31 17:19:41,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.30 | bwd_microstep: 4988.79 | bwd_inner_microstep: 4957.55 | bwd_allreduce_microstep: 31.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2122 [2024-07-31 17:19:50,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 17:19:50,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.70 | bwd_microstep: 5084.18 | bwd_inner_microstep: 4689.64 | bwd_allreduce_microstep: 394.47 | step_microstep: 189.63 [2024-07-31 17:19:50,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27977.19 | bwd: 40947.23 | bwd_inner: 39477.38 | bwd_allreduce: 1469.37 | step: 190.32 63%|██████▎ | 771/1230 [15:07:56<8:52:17, 69.58s/it] {'loss': 1.0994, 'learning_rate': 6.45773946633628e-06, 'epoch': 0.63} 63%|██████▎ | 771/1230 [15:07:56<8:52:17, 69.58s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3919 [2024-07-31 17:19:59,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.06 | bwd_microstep: 5400.20 | bwd_inner_microstep: 5332.22 | bwd_allreduce_microstep: 67.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3580 [2024-07-31 17:20:08,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.00 | bwd_microstep: 5278.20 | bwd_inner_microstep: 5180.84 | bwd_allreduce_microstep: 97.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3835 [2024-07-31 17:20:16,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.70 | bwd_microstep: 5127.80 | bwd_inner_microstep: 5084.25 | bwd_allreduce_microstep: 43.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-31 17:20:25,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.73 | bwd_microstep: 5207.31 | bwd_inner_microstep: 4801.33 | bwd_allreduce_microstep: 405.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 17:20:34,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.15 | bwd_microstep: 5213.05 | bwd_inner_microstep: 5133.39 | bwd_allreduce_microstep: 79.59 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3664 [2024-07-31 17:20:43,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.57 | bwd_microstep: 4967.20 | bwd_inner_microstep: 4911.17 | bwd_allreduce_microstep: 55.97 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 17:20:51,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.71 | bwd_microstep: 4878.31 | bwd_inner_microstep: 4858.91 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 17:21:00,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 17:21:00,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.86 | bwd_microstep: 5195.60 | bwd_inner_microstep: 4792.83 | bwd_allreduce_microstep: 402.70 | step_microstep: 181.45 [2024-07-31 17:21:00,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28831.68 | bwd: 41267.68 | bwd_inner: 40094.87 | bwd_allreduce: 1172.30 | step: 182.06 63%|██████▎ | 772/1230 [15:09:06<8:53:04, 69.84s/it] {'loss': 1.1059, 'learning_rate': 6.4331256976347434e-06, 'epoch': 0.63} 63%|██████▎ | 772/1230 [15:09:06<8:53:04, 69.84s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3927 [2024-07-31 17:21:09,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3845.88 | bwd_microstep: 5221.29 | bwd_inner_microstep: 5191.69 | bwd_allreduce_microstep: 29.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3565 [2024-07-31 17:21:18,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.95 | bwd_microstep: 5127.15 | bwd_inner_microstep: 5047.49 | bwd_allreduce_microstep: 79.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-31 17:21:27,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.79 | bwd_microstep: 5257.92 | bwd_inner_microstep: 4849.73 | bwd_allreduce_microstep: 408.12 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 17:21:36,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.68 | bwd_microstep: 5293.42 | bwd_inner_microstep: 5221.64 | bwd_allreduce_microstep: 71.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-31 17:21:44,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.45 | bwd_microstep: 4959.70 | bwd_inner_microstep: 4928.18 | bwd_allreduce_microstep: 31.45 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 17:21:53,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.58 | bwd_microstep: 5049.46 | bwd_inner_microstep: 5008.71 | bwd_allreduce_microstep: 40.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2107 [2024-07-31 17:22:02,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.21 | bwd_microstep: 5041.16 | bwd_inner_microstep: 4651.16 | bwd_allreduce_microstep: 389.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2129 [2024-07-31 17:22:10,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 17:22:10,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.13 | bwd_microstep: 5097.57 | bwd_inner_microstep: 4703.12 | bwd_allreduce_microstep: 394.39 | step_microstep: 182.66 [2024-07-31 17:22:10,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28961.58 | bwd: 41047.65 | bwd_inner: 39601.64 | bwd_allreduce: 1445.52 | step: 183.36 63%|██████▎ | 773/1230 [15:10:16<8:53:04, 69.99s/it] {'loss': 1.1778, 'learning_rate': 6.408536663619803e-06, 'epoch': 0.63} 63%|██████▎ | 773/1230 [15:10:16<8:53:04, 69.99s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 4096 [2024-07-31 17:22:19,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.18 | bwd_microstep: 5251.89 | bwd_inner_microstep: 5210.44 | bwd_allreduce_microstep: 41.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-07-31 17:22:28,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.88 | bwd_microstep: 5098.70 | bwd_inner_microstep: 5070.10 | bwd_allreduce_microstep: 28.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 17:22:37,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.33 | bwd_microstep: 5150.73 | bwd_inner_microstep: 5071.81 | bwd_allreduce_microstep: 78.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-31 17:22:46,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.64 | bwd_microstep: 5126.15 | bwd_inner_microstep: 5076.01 | bwd_allreduce_microstep: 50.07 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2095 [2024-07-31 17:22:55,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.63 | bwd_microstep: 5152.61 | bwd_inner_microstep: 4752.41 | bwd_allreduce_microstep: 400.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 17:23:03,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.56 | bwd_microstep: 5061.54 | bwd_inner_microstep: 5018.14 | bwd_allreduce_microstep: 43.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 17:23:12,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.01 | bwd_microstep: 4879.34 | bwd_inner_microstep: 4859.99 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 17:23:21,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-31 17:23:21,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.14 | bwd_microstep: 4982.20 | bwd_inner_microstep: 4946.64 | bwd_allreduce_microstep: 35.50 | step_microstep: 181.91 [2024-07-31 17:23:21,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29297.27 | bwd: 40703.16 | bwd_inner: 40005.48 | bwd_allreduce: 697.19 | step: 182.49 63%|██████▎ | 774/1230 [15:11:27<8:52:41, 70.09s/it] {'loss': 1.1603, 'learning_rate': 6.383972534805477e-06, 'epoch': 0.63} 63%|██████▎ | 774/1230 [15:11:27<8:52:41, 70.09s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4074 [2024-07-31 17:23:30,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3860.70 | bwd_microstep: 5350.75 | bwd_inner_microstep: 5331.67 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3790 [2024-07-31 17:23:39,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.06 | bwd_microstep: 5020.14 | bwd_inner_microstep: 5000.70 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 17:23:48,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.41 | bwd_microstep: 5167.74 | bwd_inner_microstep: 5088.36 | bwd_allreduce_microstep: 79.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-07-31 17:23:57,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.72 | bwd_microstep: 5305.78 | bwd_inner_microstep: 4896.57 | bwd_allreduce_microstep: 409.14 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-31 17:24:05,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.47 | bwd_microstep: 4976.56 | bwd_inner_microstep: 4957.28 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2900 [2024-07-31 17:24:14,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.62 | bwd_microstep: 5151.30 | bwd_inner_microstep: 4748.02 | bwd_allreduce_microstep: 403.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-31 17:24:23,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.98 | bwd_microstep: 5074.56 | bwd_inner_microstep: 5009.12 | bwd_allreduce_microstep: 65.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 17:24:32,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 17:24:32,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.23 | bwd_microstep: 5043.72 | bwd_inner_microstep: 5016.40 | bwd_allreduce_microstep: 27.25 | step_microstep: 181.29 [2024-07-31 17:24:32,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29349.10 | bwd: 41090.54 | bwd_inner: 40048.08 | bwd_allreduce: 1041.97 | step: 181.88 63%|██████▎ | 775/1230 [15:12:37<8:53:04, 70.30s/it] {'loss': 1.1653, 'learning_rate': 6.359433481533072e-06, 'epoch': 0.63} 63%|██████▎ | 775/1230 [15:12:37<8:53:04, 70.30s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 17:24:41,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3848.96 | bwd_microstep: 5337.87 | bwd_inner_microstep: 5318.67 | bwd_allreduce_microstep: 19.13 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3580 [2024-07-31 17:24:50,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.66 | bwd_microstep: 5232.73 | bwd_inner_microstep: 5136.57 | bwd_allreduce_microstep: 96.09 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3812 [2024-07-31 17:24:58,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.16 | bwd_microstep: 5042.65 | bwd_inner_microstep: 5023.40 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2282 [2024-07-31 17:25:07,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.10 | bwd_microstep: 5208.06 | bwd_inner_microstep: 4799.18 | bwd_allreduce_microstep: 408.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 17:25:16,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.15 | bwd_microstep: 5183.13 | bwd_inner_microstep: 5105.97 | bwd_allreduce_microstep: 77.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-07-31 17:25:25,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.46 | bwd_microstep: 5165.69 | bwd_inner_microstep: 5112.30 | bwd_allreduce_microstep: 53.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 17:25:34,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.62 | bwd_microstep: 5054.73 | bwd_inner_microstep: 5014.32 | bwd_allreduce_microstep: 40.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 17:25:42,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 17:25:42,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.47 | bwd_microstep: 4916.19 | bwd_inner_microstep: 4896.82 | bwd_allreduce_microstep: 19.31 | step_microstep: 182.46 [2024-07-31 17:25:42,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29301.48 | bwd: 41141.05 | bwd_inner: 40407.17 | bwd_allreduce: 733.39 | step: 183.17 63%|██████▎ | 776/1230 [15:13:48<8:53:00, 70.44s/it] {'loss': 1.1883, 'learning_rate': 6.3349196739700024e-06, 'epoch': 0.63} 63%|██████▎ | 776/1230 [15:13:48<8:53:00, 70.44s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 17:25:52,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3846.45 | bwd_microstep: 5358.81 | bwd_inner_microstep: 5339.72 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2238 [2024-07-31 17:26:00,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.61 | bwd_microstep: 5157.28 | bwd_inner_microstep: 4759.00 | bwd_allreduce_microstep: 398.22 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 17:26:09,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.52 | bwd_microstep: 5118.37 | bwd_inner_microstep: 5045.01 | bwd_allreduce_microstep: 73.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-31 17:26:17,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3131.66 | bwd_microstep: 4979.17 | bwd_inner_microstep: 4595.29 | bwd_allreduce_microstep: 383.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-31 17:26:26,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.97 | bwd_microstep: 4977.30 | bwd_inner_microstep: 4957.94 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 17:26:34,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3315.38 | bwd_microstep: 5005.77 | bwd_inner_microstep: 4617.70 | bwd_allreduce_microstep: 388.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2140 [2024-07-31 17:26:43,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.22 | bwd_microstep: 4996.17 | bwd_inner_microstep: 4606.33 | bwd_allreduce_microstep: 389.76 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3692 [2024-07-31 17:26:52,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 17:26:52,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.34 | bwd_microstep: 5120.07 | bwd_inner_microstep: 5032.91 | bwd_allreduce_microstep: 87.09 | step_microstep: 181.83 [2024-07-31 17:26:52,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28212.07 | bwd: 40712.93 | bwd_inner: 38953.85 | bwd_allreduce: 1758.60 | step: 182.42 63%|██████▎ | 777/1230 [15:14:58<8:49:09, 70.09s/it] {'loss': 1.1621, 'learning_rate': 6.310431282108625e-06, 'epoch': 0.63} 63%|██████▎ | 777/1230 [15:14:58<8:49:09, 70.09s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3934 [2024-07-31 17:27:01,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.98 | bwd_microstep: 5226.14 | bwd_inner_microstep: 5177.41 | bwd_allreduce_microstep: 48.66 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3570 [2024-07-31 17:27:10,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.62 | bwd_microstep: 5375.84 | bwd_inner_microstep: 5210.29 | bwd_allreduce_microstep: 165.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3583 [2024-07-31 17:27:18,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.39 | bwd_microstep: 5184.83 | bwd_inner_microstep: 5098.73 | bwd_allreduce_microstep: 86.04 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-31 17:27:27,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.41 | bwd_microstep: 5156.30 | bwd_inner_microstep: 5102.91 | bwd_allreduce_microstep: 53.31 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 17:27:36,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.72 | bwd_microstep: 4977.01 | bwd_inner_microstep: 4928.44 | bwd_allreduce_microstep: 48.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 17:27:44,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.99 | bwd_microstep: 5048.76 | bwd_inner_microstep: 4985.68 | bwd_allreduce_microstep: 63.02 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 17:27:53,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.08 | bwd_microstep: 4898.82 | bwd_inner_microstep: 4879.35 | bwd_allreduce_microstep: 19.41 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-31 17:28:02,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 17:28:02,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.92 | bwd_microstep: 5204.12 | bwd_inner_microstep: 5125.72 | bwd_allreduce_microstep: 78.31 | step_microstep: 181.76 [2024-07-31 17:28:02,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28953.03 | bwd: 41071.80 | bwd_inner: 40508.46 | bwd_allreduce: 562.83 | step: 182.35 63%|██████▎ | 778/1230 [15:16:08<8:48:35, 70.17s/it] {'loss': 1.1714, 'learning_rate': 6.2859684757650365e-06, 'epoch': 0.63} 63%|██████▎ | 778/1230 [15:16:08<8:48:35, 70.17s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3869 [2024-07-31 17:28:11,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.31 | bwd_microstep: 5138.99 | bwd_inner_microstep: 5101.27 | bwd_allreduce_microstep: 37.65 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3563 [2024-07-31 17:28:20,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.81 | bwd_microstep: 5210.59 | bwd_inner_microstep: 5107.51 | bwd_allreduce_microstep: 103.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-31 17:28:28,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.24 | bwd_microstep: 5228.17 | bwd_inner_microstep: 4822.48 | bwd_allreduce_microstep: 405.63 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 17:28:37,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.50 | bwd_microstep: 5047.91 | bwd_inner_microstep: 4988.66 | bwd_allreduce_microstep: 59.18 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2079 [2024-07-31 17:28:46,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.47 | bwd_microstep: 5261.37 | bwd_inner_microstep: 4853.62 | bwd_allreduce_microstep: 407.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2130 [2024-07-31 17:28:55,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.69 | bwd_microstep: 5213.59 | bwd_inner_microstep: 4807.00 | bwd_allreduce_microstep: 406.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 17:29:04,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.12 | bwd_microstep: 5192.55 | bwd_inner_microstep: 5107.08 | bwd_allreduce_microstep: 85.41 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 17:29:12,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 17:29:12,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.09 | bwd_microstep: 4887.03 | bwd_inner_microstep: 4867.60 | bwd_allreduce_microstep: 19.36 | step_microstep: 181.58 [2024-07-31 17:29:12,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28771.14 | bwd: 41180.19 | bwd_inner: 39655.17 | bwd_allreduce: 1524.55 | step: 182.28 63%|██████▎ | 779/1230 [15:17:18<8:47:42, 70.20s/it] {'loss': 1.132, 'learning_rate': 6.26153142457792e-06, 'epoch': 0.63} 63%|██████▎ | 779/1230 [15:17:18<8:47:42, 70.20s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3967 [2024-07-31 17:29:21,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3815.89 | bwd_microstep: 5243.51 | bwd_inner_microstep: 5224.46 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3793 [2024-07-31 17:29:30,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.67 | bwd_microstep: 5016.56 | bwd_inner_microstep: 4997.11 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 17:29:39,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.81 | bwd_microstep: 5123.37 | bwd_inner_microstep: 5051.09 | bwd_allreduce_microstep: 72.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-31 17:29:48,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.89 | bwd_microstep: 5003.74 | bwd_inner_microstep: 4984.38 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 17:29:56,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.79 | bwd_microstep: 5126.83 | bwd_inner_microstep: 5057.83 | bwd_allreduce_microstep: 68.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 17:30:05,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.80 | bwd_microstep: 5118.51 | bwd_inner_microstep: 4722.23 | bwd_allreduce_microstep: 396.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2135 [2024-07-31 17:30:13,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3024.52 | bwd_microstep: 4929.16 | bwd_inner_microstep: 4550.68 | bwd_allreduce_microstep: 378.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 17:30:22,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-31 17:30:22,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.86 | bwd_microstep: 4999.08 | bwd_inner_microstep: 4950.99 | bwd_allreduce_microstep: 48.02 | step_microstep: 181.91 [2024-07-31 17:30:22,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28567.14 | bwd: 40560.74 | bwd_inner: 39538.72 | bwd_allreduce: 1021.53 | step: 182.49 63%|██████▎ | 780/1230 [15:18:28<8:44:51, 69.98s/it] {'loss': 1.1728, 'learning_rate': 6.2371202980073596e-06, 'epoch': 0.63} 63%|██████▎ | 780/1230 [15:18:28<8:44:51, 69.98s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3884 [2024-07-31 17:30:31,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3838.00 | bwd_microstep: 5364.33 | bwd_inner_microstep: 5309.10 | bwd_allreduce_microstep: 55.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3825 [2024-07-31 17:30:40,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3818.38 | bwd_microstep: 5240.89 | bwd_inner_microstep: 5200.22 | bwd_allreduce_microstep: 40.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3779 [2024-07-31 17:30:49,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.68 | bwd_microstep: 5075.35 | bwd_inner_microstep: 5051.08 | bwd_allreduce_microstep: 24.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-31 17:30:58,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.36 | bwd_microstep: 5012.47 | bwd_inner_microstep: 4993.07 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 17:31:06,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.32 | bwd_microstep: 5040.29 | bwd_inner_microstep: 4980.29 | bwd_allreduce_microstep: 59.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 17:31:15,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.07 | bwd_microstep: 5007.40 | bwd_inner_microstep: 4988.12 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 17:31:23,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3209.27 | bwd_microstep: 4810.45 | bwd_inner_microstep: 4774.28 | bwd_allreduce_microstep: 36.11 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 17:31:32,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 17:31:32,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.80 | bwd_microstep: 5133.65 | bwd_inner_microstep: 5068.84 | bwd_allreduce_microstep: 64.74 | step_microstep: 181.57 [2024-07-31 17:31:32,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29281.80 | bwd: 40684.82 | bwd_inner: 40364.95 | bwd_allreduce: 319.38 | step: 182.16 63%|██████▎ | 781/1230 [15:19:38<8:44:24, 70.08s/it] {'loss': 1.1265, 'learning_rate': 6.212735265333655e-06, 'epoch': 0.63} 63%|██████▎ | 781/1230 [15:19:38<8:44:24, 70.08s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3919 [2024-07-31 17:31:41,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.26 | bwd_microstep: 5287.84 | bwd_inner_microstep: 5242.28 | bwd_allreduce_microstep: 45.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-31 17:31:50,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.92 | bwd_microstep: 5216.12 | bwd_inner_microstep: 4811.50 | bwd_allreduce_microstep: 404.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2236 [2024-07-31 17:31:59,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.26 | bwd_microstep: 5436.67 | bwd_inner_microstep: 5016.13 | bwd_allreduce_microstep: 420.48 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-31 17:32:08,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.03 | bwd_microstep: 5252.94 | bwd_inner_microstep: 4844.26 | bwd_allreduce_microstep: 408.62 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2170 [2024-07-31 17:32:16,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.61 | bwd_microstep: 5211.51 | bwd_inner_microstep: 4804.90 | bwd_allreduce_microstep: 406.55 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2108 [2024-07-31 17:32:25,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.06 | bwd_microstep: 5123.09 | bwd_inner_microstep: 4726.13 | bwd_allreduce_microstep: 396.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-31 17:32:34,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.93 | bwd_microstep: 4871.76 | bwd_inner_microstep: 4852.23 | bwd_allreduce_microstep: 19.47 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 17:32:42,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.73 [2024-07-31 17:32:42,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.19 | bwd_microstep: 4874.80 | bwd_inner_microstep: 4830.58 | bwd_allreduce_microstep: 44.15 | step_microstep: 181.71 [2024-07-31 17:32:42,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28361.15 | bwd: 41274.73 | bwd_inner: 39127.94 | bwd_allreduce: 2146.29 | step: 182.41 64%|██████▎ | 782/1230 [15:20:48<8:42:59, 70.04s/it] {'loss': 1.1612, 'learning_rate': 6.188376495656156e-06, 'epoch': 0.64} 64%|██████▎ | 782/1230 [15:20:48<8:42:59, 70.04s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3905 [2024-07-31 17:32:51,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.73 | bwd_microstep: 5267.49 | bwd_inner_microstep: 5221.92 | bwd_allreduce_microstep: 45.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3900 [2024-07-31 17:33:00,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3785.02 | bwd_microstep: 5123.37 | bwd_inner_microstep: 5104.05 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-31 17:33:08,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3253.79 | bwd_microstep: 4914.01 | bwd_inner_microstep: 4880.23 | bwd_allreduce_microstep: 33.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 17:33:17,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.80 | bwd_microstep: 5074.57 | bwd_inner_microstep: 5045.26 | bwd_allreduce_microstep: 29.24 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-31 17:33:25,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.88 | bwd_microstep: 4831.97 | bwd_inner_microstep: 4807.04 | bwd_allreduce_microstep: 24.86 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2084 [2024-07-31 17:33:33,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3048.23 | bwd_microstep: 5028.24 | bwd_inner_microstep: 4643.52 | bwd_allreduce_microstep: 384.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 17:33:42,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.93 | bwd_microstep: 5044.95 | bwd_inner_microstep: 4985.82 | bwd_allreduce_microstep: 59.06 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2193 [2024-07-31 17:33:50,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 17:33:50,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.67 | bwd_microstep: 5055.30 | bwd_inner_microstep: 4662.74 | bwd_allreduce_microstep: 392.49 | step_microstep: 181.57 [2024-07-31 17:33:50,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27777.95 | bwd: 40339.88 | bwd_inner: 39350.53 | bwd_allreduce: 988.86 | step: 182.16 64%|██████▎ | 783/1230 [15:21:56<8:38:15, 69.56s/it] {'loss': 1.1061, 'learning_rate': 6.164044157892102e-06, 'epoch': 0.64} 64%|██████▎ | 783/1230 [15:21:56<8:38:15, 69.56s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3980 [2024-07-31 17:34:00,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.55 | bwd_microstep: 5380.95 | bwd_inner_microstep: 5337.52 | bwd_allreduce_microstep: 43.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3820 [2024-07-31 17:34:08,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.80 | bwd_microstep: 5049.38 | bwd_inner_microstep: 5030.04 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 17:34:17,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.54 | bwd_microstep: 5233.68 | bwd_inner_microstep: 5138.97 | bwd_allreduce_microstep: 94.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-31 17:34:26,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3439.11 | bwd_microstep: 4884.43 | bwd_inner_microstep: 4857.29 | bwd_allreduce_microstep: 27.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3665 [2024-07-31 17:34:34,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.53 | bwd_microstep: 5009.95 | bwd_inner_microstep: 4968.11 | bwd_allreduce_microstep: 41.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 17:34:43,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3389.99 | bwd_microstep: 4922.28 | bwd_inner_microstep: 4886.70 | bwd_allreduce_microstep: 35.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 17:34:51,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.32 | bwd_microstep: 5056.86 | bwd_inner_microstep: 4996.55 | bwd_allreduce_microstep: 60.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 17:34:59,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 17:34:59,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2999.86 | bwd_microstep: 4882.27 | bwd_inner_microstep: 4506.05 | bwd_allreduce_microstep: 376.16 | step_microstep: 181.28 [2024-07-31 17:34:59,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28244.59 | bwd: 40419.78 | bwd_inner: 39721.16 | bwd_allreduce: 698.14 | step: 181.86 64%|██████▎ | 784/1230 [15:23:05<8:35:49, 69.39s/it] {'loss': 1.0818, 'learning_rate': 6.13973842077543e-06, 'epoch': 0.64} 64%|██████▎ | 784/1230 [15:23:05<8:35:49, 69.39s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 1993 [2024-07-31 17:35:08,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.74 | bwd_microstep: 5388.62 | bwd_inner_microstep: 4971.13 | bwd_allreduce_microstep: 417.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3815 [2024-07-31 17:35:17,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.71 | bwd_microstep: 5024.89 | bwd_inner_microstep: 5005.56 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3776 [2024-07-31 17:35:26,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.28 | bwd_microstep: 5133.13 | bwd_inner_microstep: 5074.30 | bwd_allreduce_microstep: 58.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3587 [2024-07-31 17:35:35,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.66 | bwd_microstep: 5016.24 | bwd_inner_microstep: 4933.86 | bwd_allreduce_microstep: 82.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 17:35:43,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.32 | bwd_microstep: 5193.57 | bwd_inner_microstep: 5114.52 | bwd_allreduce_microstep: 78.97 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2866 [2024-07-31 17:35:52,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3089.00 | bwd_microstep: 5001.99 | bwd_inner_microstep: 4638.03 | bwd_allreduce_microstep: 363.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 17:36:00,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.89 | bwd_microstep: 5145.40 | bwd_inner_microstep: 5074.58 | bwd_allreduce_microstep: 70.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 17:36:08,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 17:36:08,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3195.03 | bwd_microstep: 4714.76 | bwd_inner_microstep: 4692.03 | bwd_allreduce_microstep: 22.66 | step_microstep: 182.41 [2024-07-31 17:36:08,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27993.53 | bwd: 40618.58 | bwd_inner: 39503.97 | bwd_allreduce: 1114.13 | step: 183.00 64%|██████▍ | 785/1230 [15:24:14<8:33:40, 69.26s/it] {'loss': 1.1483, 'learning_rate': 6.1154594528556075e-06, 'epoch': 0.64} 64%|██████▍ | 785/1230 [15:24:14<8:33:40, 69.26s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3942 [2024-07-31 17:36:17,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3810.84 | bwd_microstep: 5196.42 | bwd_inner_microstep: 5177.29 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3976 [2024-07-31 17:36:26,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.10 | bwd_microstep: 5103.76 | bwd_inner_microstep: 5080.45 | bwd_allreduce_microstep: 23.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-07-31 17:36:35,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.91 | bwd_microstep: 5250.92 | bwd_inner_microstep: 4844.28 | bwd_allreduce_microstep: 406.57 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2255 [2024-07-31 17:36:44,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.65 | bwd_microstep: 5166.14 | bwd_inner_microstep: 4760.96 | bwd_allreduce_microstep: 405.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 17:36:52,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3190.90 | bwd_microstep: 4764.42 | bwd_inner_microstep: 4731.43 | bwd_allreduce_microstep: 32.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 17:37:00,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.10 | bwd_microstep: 4947.09 | bwd_inner_microstep: 4903.79 | bwd_allreduce_microstep: 43.23 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 17:37:09,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.99 | bwd_microstep: 4909.33 | bwd_inner_microstep: 4889.95 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2141 [2024-07-31 17:37:18,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 17:37:18,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.69 | bwd_microstep: 5193.23 | bwd_inner_microstep: 4790.23 | bwd_allreduce_microstep: 402.93 | step_microstep: 181.88 [2024-07-31 17:37:18,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28497.09 | bwd: 40531.27 | bwd_inner: 39178.32 | bwd_allreduce: 1352.46 | step: 182.47 64%|██████▍ | 786/1230 [15:25:24<8:32:44, 69.29s/it] {'loss': 1.1565, 'learning_rate': 6.091207422496487e-06, 'epoch': 0.64} 64%|██████▍ | 786/1230 [15:25:24<8:32:44, 69.29s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2337 [2024-07-31 17:37:27,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.59 | bwd_microstep: 5386.09 | bwd_inner_microstep: 4972.17 | bwd_allreduce_microstep: 413.86 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3763 [2024-07-31 17:37:36,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.85 | bwd_microstep: 5192.97 | bwd_inner_microstep: 5124.28 | bwd_allreduce_microstep: 68.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3858 [2024-07-31 17:37:45,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3796.03 | bwd_microstep: 5177.33 | bwd_inner_microstep: 5150.93 | bwd_allreduce_microstep: 26.34 | step_microstep: 0.18 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3797 [2024-07-31 17:37:53,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.09 | bwd_microstep: 5190.62 | bwd_inner_microstep: 5152.86 | bwd_allreduce_microstep: 37.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-07-31 17:38:02,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.75 | bwd_microstep: 5109.10 | bwd_inner_microstep: 4710.49 | bwd_allreduce_microstep: 398.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-31 17:38:10,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3241.25 | bwd_microstep: 4909.49 | bwd_inner_microstep: 4861.34 | bwd_allreduce_microstep: 48.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 17:38:19,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.37 | bwd_microstep: 5053.41 | bwd_inner_microstep: 4988.49 | bwd_allreduce_microstep: 64.85 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3719 [2024-07-31 17:38:28,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 17:38:28,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.16 | bwd_microstep: 5066.20 | bwd_inner_microstep: 5008.94 | bwd_allreduce_microstep: 57.19 | step_microstep: 181.77 [2024-07-31 17:38:28,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28573.00 | bwd: 41085.20 | bwd_inner: 39969.44 | bwd_allreduce: 1115.27 | step: 182.45 64%|██████▍ | 787/1230 [15:26:34<8:33:08, 69.50s/it] {'loss': 1.1751, 'learning_rate': 6.066982497875107e-06, 'epoch': 0.64} 64%|██████▍ | 787/1230 [15:26:34<8:33:08, 69.50s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3529 [2024-07-31 17:38:37,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.39 | bwd_microstep: 5243.83 | bwd_inner_microstep: 5100.36 | bwd_allreduce_microstep: 143.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-31 17:38:45,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.95 | bwd_microstep: 5108.75 | bwd_inner_microstep: 5066.88 | bwd_allreduce_microstep: 41.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3784 [2024-07-31 17:38:54,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.73 | bwd_microstep: 5106.81 | bwd_inner_microstep: 5062.86 | bwd_allreduce_microstep: 43.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-07-31 17:39:03,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.60 | bwd_microstep: 5223.91 | bwd_inner_microstep: 4819.36 | bwd_allreduce_microstep: 404.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 17:39:12,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.29 | bwd_microstep: 4989.25 | bwd_inner_microstep: 4955.59 | bwd_allreduce_microstep: 33.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 17:39:20,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.75 | bwd_microstep: 5038.09 | bwd_inner_microstep: 4982.97 | bwd_allreduce_microstep: 55.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 17:39:29,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.43 | bwd_microstep: 5050.91 | bwd_inner_microstep: 4991.30 | bwd_allreduce_microstep: 59.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 17:39:38,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 17:39:38,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.02 | bwd_microstep: 5225.46 | bwd_inner_microstep: 5162.20 | bwd_allreduce_microstep: 63.19 | step_microstep: 182.41 [2024-07-31 17:39:38,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28799.04 | bwd: 40987.00 | bwd_inner: 40141.45 | bwd_allreduce: 845.06 | step: 182.99 64%|██████▍ | 788/1230 [15:27:44<8:33:20, 69.69s/it] {'loss': 1.1733, 'learning_rate': 6.042784846980542e-06, 'epoch': 0.64} 64%|██████▍ | 788/1230 [15:27:44<8:33:20, 69.69s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3972 [2024-07-31 17:39:47,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3842.39 | bwd_microstep: 5245.19 | bwd_inner_microstep: 5226.11 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2838 [2024-07-31 17:39:56,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.27 | bwd_microstep: 5155.05 | bwd_inner_microstep: 4750.34 | bwd_allreduce_microstep: 404.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 17:40:04,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.77 | bwd_microstep: 5139.59 | bwd_inner_microstep: 5063.13 | bwd_allreduce_microstep: 76.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 17:40:13,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.69 | bwd_microstep: 5222.90 | bwd_inner_microstep: 5133.90 | bwd_allreduce_microstep: 88.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-31 17:40:22,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.75 | bwd_microstep: 4997.21 | bwd_inner_microstep: 4977.87 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-31 17:40:31,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.14 | bwd_microstep: 5093.09 | bwd_inner_microstep: 4695.60 | bwd_allreduce_microstep: 397.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 17:40:39,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.39 | bwd_microstep: 4787.58 | bwd_inner_microstep: 4768.27 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-31 17:40:47,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 17:40:47,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.20 | bwd_microstep: 4909.24 | bwd_inner_microstep: 4887.38 | bwd_allreduce_microstep: 21.79 | step_microstep: 181.50 [2024-07-31 17:40:47,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28727.52 | bwd: 40549.83 | bwd_inner: 39502.54 | bwd_allreduce: 1046.80 | step: 182.08 64%|██████▍ | 789/1230 [15:28:53<8:32:01, 69.66s/it] {'loss': 1.1758, 'learning_rate': 6.018614637612733e-06, 'epoch': 0.64} 64%|██████▍ | 789/1230 [15:28:53<8:32:01, 69.66s/it]dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1320 [2024-07-31 17:40:57,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.86 | bwd_microstep: 5665.92 | bwd_inner_microstep: 5228.27 | bwd_allreduce_microstep: 437.58 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3828 [2024-07-31 17:41:06,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.68 | bwd_microstep: 5319.42 | bwd_inner_microstep: 5232.68 | bwd_allreduce_microstep: 86.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2252 [2024-07-31 17:41:15,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.77 | bwd_microstep: 5244.11 | bwd_inner_microstep: 4837.60 | bwd_allreduce_microstep: 406.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-31 17:41:23,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.29 | bwd_microstep: 5030.08 | bwd_inner_microstep: 5004.66 | bwd_allreduce_microstep: 25.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 17:41:32,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.81 | bwd_microstep: 4981.31 | bwd_inner_microstep: 4961.94 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 17:41:41,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.22 | bwd_microstep: 5035.96 | bwd_inner_microstep: 4978.87 | bwd_allreduce_microstep: 57.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-31 17:41:49,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.71 | bwd_microstep: 4888.14 | bwd_inner_microstep: 4868.70 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 17:41:57,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 17:41:57,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.89 | bwd_microstep: 4722.57 | bwd_inner_microstep: 4698.34 | bwd_allreduce_microstep: 24.15 | step_microstep: 181.20 [2024-07-31 17:41:57,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28807.13 | bwd: 40887.51 | bwd_inner: 39811.01 | bwd_allreduce: 1076.00 | step: 181.78 64%|██████▍ | 790/1230 [15:30:03<8:31:39, 69.77s/it] {'loss': 1.1392, 'learning_rate': 5.99447203738134e-06, 'epoch': 0.64} 64%|██████▍ | 790/1230 [15:30:03<8:31:39, 69.77s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2318 [2024-07-31 17:42:06,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3196.46 | bwd_microstep: 5441.88 | bwd_inner_microstep: 5025.87 | bwd_allreduce_microstep: 415.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3580 [2024-07-31 17:42:14,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3208.66 | bwd_microstep: 4861.84 | bwd_inner_microstep: 4809.79 | bwd_allreduce_microstep: 51.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-31 17:42:23,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.46 | bwd_microstep: 5140.40 | bwd_inner_microstep: 5092.95 | bwd_allreduce_microstep: 47.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3620 [2024-07-31 17:42:32,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.26 | bwd_microstep: 5189.06 | bwd_inner_microstep: 5093.70 | bwd_allreduce_microstep: 95.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 17:42:40,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.53 | bwd_microstep: 4847.31 | bwd_inner_microstep: 4802.65 | bwd_allreduce_microstep: 44.60 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-31 17:42:48,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.18 | bwd_microstep: 4974.77 | bwd_inner_microstep: 4939.81 | bwd_allreduce_microstep: 34.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 17:42:57,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.19 | bwd_microstep: 5104.86 | bwd_inner_microstep: 4708.95 | bwd_allreduce_microstep: 395.84 | step_microstep: 0.20 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 17:43:06,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 17:43:06,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.15 | bwd_microstep: 4982.30 | bwd_inner_microstep: 4932.99 | bwd_allreduce_microstep: 49.24 | step_microstep: 181.93 [2024-07-31 17:43:06,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27474.78 | bwd: 40542.41 | bwd_inner: 39406.66 | bwd_allreduce: 1135.25 | step: 182.64 64%|██████▍ | 791/1230 [15:31:12<8:27:22, 69.34s/it] {'loss': 1.1374, 'learning_rate': 5.9703572137045495e-06, 'epoch': 0.64} 64%|██████▍ | 791/1230 [15:31:12<8:27:22, 69.34s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3932 [2024-07-31 17:43:15,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3846.94 | bwd_microstep: 5191.10 | bwd_inner_microstep: 5162.69 | bwd_allreduce_microstep: 28.34 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3620 [2024-07-31 17:43:24,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.69 | bwd_microstep: 5126.77 | bwd_inner_microstep: 5031.45 | bwd_allreduce_microstep: 95.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3834 [2024-07-31 17:43:33,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.95 | bwd_microstep: 5074.57 | bwd_inner_microstep: 5054.43 | bwd_allreduce_microstep: 20.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 17:43:41,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.33 | bwd_microstep: 5009.12 | bwd_inner_microstep: 4988.28 | bwd_allreduce_microstep: 20.76 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3629 [2024-07-31 17:43:50,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.56 | bwd_microstep: 5064.12 | bwd_inner_microstep: 4982.89 | bwd_allreduce_microstep: 81.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 17:43:59,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.79 | bwd_microstep: 5070.09 | bwd_inner_microstep: 5006.48 | bwd_allreduce_microstep: 63.54 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3644 [2024-07-31 17:44:07,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.04 | bwd_microstep: 5184.31 | bwd_inner_microstep: 5085.02 | bwd_allreduce_microstep: 99.23 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1099 [2024-07-31 17:44:16,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 17:44:16,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.60 | bwd_microstep: 5199.99 | bwd_inner_microstep: 4797.39 | bwd_allreduce_microstep: 402.53 | step_microstep: 181.45 [2024-07-31 17:44:16,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29227.80 | bwd: 40920.05 | bwd_inner: 40108.59 | bwd_allreduce: 810.96 | step: 182.04 64%|██████▍ | 792/1230 [15:32:22<8:28:43, 69.69s/it] {'loss': 1.1059, 'learning_rate': 5.94627033380794e-06, 'epoch': 0.64} 64%|██████▍ | 792/1230 [15:32:22<8:28:43, 69.69s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3924 [2024-07-31 17:44:25,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.84 | bwd_microstep: 5333.38 | bwd_inner_microstep: 5276.58 | bwd_allreduce_microstep: 56.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3858 [2024-07-31 17:44:34,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.51 | bwd_microstep: 5113.21 | bwd_inner_microstep: 5074.65 | bwd_allreduce_microstep: 38.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 17:44:43,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.38 | bwd_microstep: 4955.76 | bwd_inner_microstep: 4926.92 | bwd_allreduce_microstep: 28.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 17:44:52,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.18 | bwd_microstep: 5168.32 | bwd_inner_microstep: 5092.64 | bwd_allreduce_microstep: 75.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 17:45:00,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.95 | bwd_microstep: 4988.35 | bwd_inner_microstep: 4968.98 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2160 [2024-07-31 17:45:09,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.65 | bwd_microstep: 5224.71 | bwd_inner_microstep: 4820.11 | bwd_allreduce_microstep: 404.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 17:45:17,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3030.50 | bwd_microstep: 4920.60 | bwd_inner_microstep: 4541.62 | bwd_allreduce_microstep: 378.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 17:45:26,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 17:45:26,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.85 | bwd_microstep: 4968.42 | bwd_inner_microstep: 4921.69 | bwd_allreduce_microstep: 46.66 | step_microstep: 181.95 [2024-07-31 17:45:26,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28561.77 | bwd: 40672.74 | bwd_inner: 39623.13 | bwd_allreduce: 1049.13 | step: 182.52 64%|██████▍ | 793/1230 [15:33:32<8:27:17, 69.65s/it] {'loss': 1.1694, 'learning_rate': 5.922211564723299e-06, 'epoch': 0.64} 64%|██████▍ | 793/1230 [15:33:32<8:27:17, 69.65s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-07-31 17:45:35,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3800.87 | bwd_microstep: 5186.57 | bwd_inner_microstep: 5165.49 | bwd_allreduce_microstep: 21.01 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2038 [2024-07-31 17:45:44,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3401.85 | bwd_microstep: 5332.48 | bwd_inner_microstep: 4922.59 | bwd_allreduce_microstep: 409.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-31 17:45:53,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.64 | bwd_microstep: 5239.72 | bwd_inner_microstep: 5154.94 | bwd_allreduce_microstep: 84.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2233 [2024-07-31 17:46:01,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.46 | bwd_microstep: 5136.48 | bwd_inner_microstep: 4738.87 | bwd_allreduce_microstep: 397.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 17:46:09,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.93 | bwd_microstep: 4747.36 | bwd_inner_microstep: 4722.59 | bwd_allreduce_microstep: 24.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2142 [2024-07-31 17:46:17,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3026.69 | bwd_microstep: 4903.61 | bwd_inner_microstep: 4525.17 | bwd_allreduce_microstep: 378.37 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2108 [2024-07-31 17:46:26,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.95 | bwd_microstep: 5106.66 | bwd_inner_microstep: 4709.00 | bwd_allreduce_microstep: 397.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 17:46:35,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 17:46:35,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.33 | bwd_microstep: 4999.53 | bwd_inner_microstep: 4941.27 | bwd_allreduce_microstep: 58.19 | step_microstep: 182.65 [2024-07-31 17:46:35,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27657.62 | bwd: 40652.39 | bwd_inner: 38879.85 | bwd_allreduce: 1772.05 | step: 183.23 65%|██████▍ | 794/1230 [15:34:40<8:23:55, 69.35s/it] {'loss': 1.1634, 'learning_rate': 5.8981810732875024e-06, 'epoch': 0.65} 65%|██████▍ | 794/1230 [15:34:40<8:23:55, 69.35s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4096 [2024-07-31 17:46:44,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.78 | bwd_microstep: 5531.89 | bwd_inner_microstep: 5490.25 | bwd_allreduce_microstep: 41.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3926 [2024-07-31 17:46:53,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.20 | bwd_microstep: 5080.83 | bwd_inner_microstep: 5044.40 | bwd_allreduce_microstep: 36.37 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2861 [2024-07-31 17:47:01,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.11 | bwd_microstep: 5195.02 | bwd_inner_microstep: 4788.86 | bwd_allreduce_microstep: 406.09 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2294 [2024-07-31 17:47:10,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.68 | bwd_microstep: 5143.81 | bwd_inner_microstep: 4743.86 | bwd_allreduce_microstep: 399.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 17:47:19,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.95 | bwd_microstep: 5115.46 | bwd_inner_microstep: 5048.60 | bwd_allreduce_microstep: 66.79 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 17:47:28,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.99 | bwd_microstep: 4991.70 | bwd_inner_microstep: 4972.32 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 17:47:36,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.84 | bwd_microstep: 5185.75 | bwd_inner_microstep: 5109.96 | bwd_allreduce_microstep: 75.73 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-31 17:47:45,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 17:47:45,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.98 | bwd_microstep: 5090.00 | bwd_inner_microstep: 4695.08 | bwd_allreduce_microstep: 394.85 | step_microstep: 181.80 [2024-07-31 17:47:45,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28937.45 | bwd: 41334.44 | bwd_inner: 39893.27 | bwd_allreduce: 1440.68 | step: 182.38 65%|██████▍ | 795/1230 [15:35:51<8:25:30, 69.72s/it] {'loss': 1.1399, 'learning_rate': 5.87417902614131e-06, 'epoch': 0.65} 65%|██████▍ | 795/1230 [15:35:51<8:25:30, 69.72s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3990 [2024-07-31 17:47:54,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.38 | bwd_microstep: 5210.59 | bwd_inner_microstep: 5181.06 | bwd_allreduce_microstep: 29.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3872 [2024-07-31 17:48:03,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3803.72 | bwd_microstep: 5122.13 | bwd_inner_microstep: 5102.73 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-07-31 17:48:12,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.06 | bwd_microstep: 5227.99 | bwd_inner_microstep: 5138.75 | bwd_allreduce_microstep: 89.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 17:48:21,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.03 | bwd_microstep: 5281.26 | bwd_inner_microstep: 5178.63 | bwd_allreduce_microstep: 102.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-31 17:48:29,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3465.90 | bwd_microstep: 5121.95 | bwd_inner_microstep: 4727.24 | bwd_allreduce_microstep: 394.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 17:48:38,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.37 | bwd_microstep: 5178.87 | bwd_inner_microstep: 5098.81 | bwd_allreduce_microstep: 79.99 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 17:48:47,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.70 | bwd_microstep: 5039.29 | bwd_inner_microstep: 4982.07 | bwd_allreduce_microstep: 57.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 17:48:56,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 17:48:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3453.19 | bwd_microstep: 5016.78 | bwd_inner_microstep: 4627.16 | bwd_allreduce_microstep: 389.56 | step_microstep: 181.27 [2024-07-31 17:48:56,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28878.28 | bwd: 41198.83 | bwd_inner: 40036.37 | bwd_allreduce: 1161.97 | step: 181.98 65%|██████▍ | 796/1230 [15:37:01<8:25:50, 69.93s/it] {'loss': 1.17, 'learning_rate': 5.850205589728239e-06, 'epoch': 0.65} 65%|██████▍ | 796/1230 [15:37:01<8:25:50, 69.93s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3888 [2024-07-31 17:49:05,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.73 | bwd_microstep: 5330.19 | bwd_inner_microstep: 5270.12 | bwd_allreduce_microstep: 60.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3764 [2024-07-31 17:49:13,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.03 | bwd_microstep: 5169.52 | bwd_inner_microstep: 5119.33 | bwd_allreduce_microstep: 50.12 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2080 [2024-07-31 17:49:22,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.66 | bwd_microstep: 5166.41 | bwd_inner_microstep: 4763.82 | bwd_allreduce_microstep: 402.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3618 [2024-07-31 17:49:31,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.73 | bwd_microstep: 4979.81 | bwd_inner_microstep: 4908.56 | bwd_allreduce_microstep: 71.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3828 [2024-07-31 17:49:39,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.24 | bwd_microstep: 5053.31 | bwd_inner_microstep: 5034.00 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 17:49:48,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.13 | bwd_microstep: 4981.24 | bwd_inner_microstep: 4961.91 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3708 [2024-07-31 17:49:57,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.93 | bwd_microstep: 5012.24 | bwd_inner_microstep: 4946.51 | bwd_allreduce_microstep: 65.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2128 [2024-07-31 17:50:06,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 17:50:06,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.53 | bwd_microstep: 5159.92 | bwd_inner_microstep: 4758.46 | bwd_allreduce_microstep: 401.38 | step_microstep: 182.30 [2024-07-31 17:50:06,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28817.86 | bwd: 40852.64 | bwd_inner: 39762.66 | bwd_allreduce: 1089.49 | step: 182.87 65%|██████▍ | 797/1230 [15:38:11<8:24:49, 69.95s/it] {'loss': 1.1192, 'learning_rate': 5.826260930293417e-06, 'epoch': 0.65} 65%|██████▍ | 797/1230 [15:38:11<8:24:49, 69.95s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3939 [2024-07-31 17:50:15,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.15 | bwd_microstep: 5464.09 | bwd_inner_microstep: 5390.27 | bwd_allreduce_microstep: 73.75 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2329 [2024-07-31 17:50:24,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.39 | bwd_microstep: 5372.48 | bwd_inner_microstep: 4956.58 | bwd_allreduce_microstep: 415.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2261 [2024-07-31 17:50:33,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.84 | bwd_microstep: 5191.46 | bwd_inner_microstep: 4787.66 | bwd_allreduce_microstep: 403.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-31 17:50:41,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.95 | bwd_microstep: 5232.19 | bwd_inner_microstep: 4824.70 | bwd_allreduce_microstep: 407.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 17:50:50,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.00 | bwd_microstep: 5195.32 | bwd_inner_microstep: 5117.40 | bwd_allreduce_microstep: 77.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 17:50:59,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.00 | bwd_microstep: 5028.01 | bwd_inner_microstep: 4969.91 | bwd_allreduce_microstep: 58.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-31 17:51:07,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.44 | bwd_microstep: 5104.56 | bwd_inner_microstep: 4708.10 | bwd_allreduce_microstep: 396.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3674 [2024-07-31 17:51:16,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 17:51:16,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.34 | bwd_microstep: 4887.92 | bwd_inner_microstep: 4865.67 | bwd_allreduce_microstep: 22.18 | step_microstep: 182.43 [2024-07-31 17:51:16,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28855.02 | bwd: 41476.00 | bwd_inner: 39620.23 | bwd_allreduce: 1855.28 | step: 182.99 65%|██████▍ | 798/1230 [15:39:22<8:25:11, 70.17s/it] {'loss': 1.1306, 'learning_rate': 5.802345213882399e-06, 'epoch': 0.65} 65%|██████▍ | 798/1230 [15:39:22<8:25:11, 70.17s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3860 [2024-07-31 17:51:25,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.73 | bwd_microstep: 5456.62 | bwd_inner_microstep: 5369.42 | bwd_allreduce_microstep: 87.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3921 [2024-07-31 17:51:34,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3804.97 | bwd_microstep: 5148.36 | bwd_inner_microstep: 5129.03 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3960 [2024-07-31 17:51:43,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.67 | bwd_microstep: 5258.61 | bwd_inner_microstep: 5212.29 | bwd_allreduce_microstep: 46.26 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 17:51:52,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.64 | bwd_microstep: 5081.43 | bwd_inner_microstep: 4687.06 | bwd_allreduce_microstep: 394.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-31 17:52:00,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3020.16 | bwd_microstep: 4938.86 | bwd_inner_microstep: 4557.24 | bwd_allreduce_microstep: 381.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 17:52:08,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.67 | bwd_microstep: 4892.88 | bwd_inner_microstep: 4873.49 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3769 [2024-07-31 17:52:17,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.49 | bwd_microstep: 4935.55 | bwd_inner_microstep: 4908.19 | bwd_allreduce_microstep: 27.29 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3679 [2024-07-31 17:52:26,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 17:52:26,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.06 | bwd_microstep: 5190.13 | bwd_inner_microstep: 5100.61 | bwd_allreduce_microstep: 89.45 | step_microstep: 183.05 [2024-07-31 17:52:26,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28534.28 | bwd: 40902.44 | bwd_inner: 39837.26 | bwd_allreduce: 1064.69 | step: 183.64 65%|██████▍ | 799/1230 [15:40:32<8:23:10, 70.05s/it] {'loss': 1.1511, 'learning_rate': 5.778458606340037e-06, 'epoch': 0.65} 65%|██████▍ | 799/1230 [15:40:32<8:23:10, 70.05s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3532 [2024-07-31 17:52:35,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.93 | bwd_microstep: 5304.81 | bwd_inner_microstep: 5196.07 | bwd_allreduce_microstep: 108.68 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3898 [2024-07-31 17:52:43,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3271.29 | bwd_microstep: 4933.81 | bwd_inner_microstep: 4914.50 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2245 [2024-07-31 17:52:51,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3032.92 | bwd_microstep: 5016.98 | bwd_inner_microstep: 4632.60 | bwd_allreduce_microstep: 384.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 17:53:00,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.63 | bwd_microstep: 5062.39 | bwd_inner_microstep: 5007.26 | bwd_allreduce_microstep: 55.05 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 17:53:08,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3250.16 | bwd_microstep: 4785.66 | bwd_inner_microstep: 4766.21 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2138 [2024-07-31 17:53:17,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.65 | bwd_microstep: 5154.57 | bwd_inner_microstep: 4754.47 | bwd_allreduce_microstep: 400.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 17:53:25,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.81 | bwd_microstep: 4979.21 | bwd_inner_microstep: 4959.81 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2127 [2024-07-31 17:53:34,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 17:53:34,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.20 | bwd_microstep: 5099.90 | bwd_inner_microstep: 4703.39 | bwd_allreduce_microstep: 396.44 | step_microstep: 181.37 [2024-07-31 17:53:34,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27562.51 | bwd: 40337.31 | bwd_inner: 38934.24 | bwd_allreduce: 1402.55 | step: 182.05 65%|██████▌ | 800/1230 [15:41:40<8:18:05, 69.50s/it] {'loss': 1.1951, 'learning_rate': 5.754601273309333e-06, 'epoch': 0.65} 65%|██████▌ | 800/1230 [15:41:40<8:18:05, 69.50s/it][INFO|trainer.py:2936] 2024-07-31 17:54:01,459 >> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800 [INFO|configuration_utils.py:473] 2024-07-31 17:54:01,461 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/config.json [INFO|configuration_utils.py:594] 2024-07-31 17:54:01,461 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/generation_config.json [INFO|modeling_utils.py:2501] 2024-07-31 17:54:55,091 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-07-31 17:54:55,093 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-07-31 17:54:55,093 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-07-31 17:54:55,093 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/added_tokens.json [2024-07-31 17:54:55,133] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step800 is about to be saved! [2024-07-31 17:54:57,905] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/global_step800/zero_pp_rank_0_mp_rank_00_model_states.pt [2024-07-31 17:54:57,905] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/global_step800/zero_pp_rank_0_mp_rank_00_model_states.pt... [2024-07-31 17:54:59,152] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/global_step800/zero_pp_rank_0_mp_rank_00_model_states.pt. [2024-07-31 17:54:59,155] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/global_step800/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-07-31 17:55:58,655] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/global_step800/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-07-31 17:55:58,655] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-800/global_step800/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-07-31 17:55:58,742] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step800 is ready now! [INFO|trainer.py:3028] 2024-07-31 17:55:58,770 >> Deleting older checkpoint [/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/checkpoint-600] due to args.save_total_limit dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1857 [2024-07-31 17:56:40,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.07 | bwd_microstep: 5290.56 | bwd_inner_microstep: 4882.62 | bwd_allreduce_microstep: 407.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3778 [2024-07-31 17:56:48,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3240.21 | bwd_microstep: 4937.53 | bwd_inner_microstep: 4901.47 | bwd_allreduce_microstep: 35.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3763 [2024-07-31 17:56:57,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.67 | bwd_microstep: 5199.41 | bwd_inner_microstep: 5139.12 | bwd_allreduce_microstep: 60.22 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3813 [2024-07-31 17:57:06,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.68 | bwd_microstep: 5063.50 | bwd_inner_microstep: 5035.12 | bwd_allreduce_microstep: 28.31 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 17:57:15,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.14 | bwd_microstep: 4962.39 | bwd_inner_microstep: 4943.04 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2111 [2024-07-31 17:57:22,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3005.13 | bwd_microstep: 4885.48 | bwd_inner_microstep: 4510.51 | bwd_allreduce_microstep: 374.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-31 17:57:31,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.76 | bwd_microstep: 5077.81 | bwd_inner_microstep: 4683.93 | bwd_allreduce_microstep: 393.81 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3660 [2024-07-31 17:57:40,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 17:57:40,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.51 | bwd_microstep: 4945.33 | bwd_inner_microstep: 4908.47 | bwd_allreduce_microstep: 36.79 | step_microstep: 182.05 [2024-07-31 17:57:40,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27911.07 | bwd: 40362.00 | bwd_inner: 39004.22 | bwd_allreduce: 1357.29 | step: 182.64 65%|██████▌ | 801/1230 [15:45:46<14:34:30, 122.31s/it] {'loss': 1.1594, 'learning_rate': 5.730773380230276e-06, 'epoch': 0.65} 65%|██████▌ | 801/1230 [15:45:46<14:34:30, 122.31s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3941 [2024-07-31 17:57:49,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3843.78 | bwd_microstep: 5364.93 | bwd_inner_microstep: 5320.63 | bwd_allreduce_microstep: 44.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3581 [2024-07-31 17:57:58,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.98 | bwd_microstep: 5136.26 | bwd_inner_microstep: 5057.60 | bwd_allreduce_microstep: 78.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3781 [2024-07-31 17:58:07,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.93 | bwd_microstep: 5292.15 | bwd_inner_microstep: 5222.04 | bwd_allreduce_microstep: 70.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 17:58:15,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.46 | bwd_microstep: 5077.50 | bwd_inner_microstep: 5033.44 | bwd_allreduce_microstep: 44.00 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-31 17:58:24,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.38 | bwd_microstep: 4933.89 | bwd_inner_microstep: 4905.39 | bwd_allreduce_microstep: 28.44 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-31 17:58:33,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.99 | bwd_microstep: 5143.33 | bwd_inner_microstep: 4741.70 | bwd_allreduce_microstep: 401.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 17:58:41,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.71 | bwd_microstep: 5022.25 | bwd_inner_microstep: 4969.79 | bwd_allreduce_microstep: 52.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 17:58:50,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 17:58:50,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.54 | bwd_microstep: 4981.37 | bwd_inner_microstep: 4933.13 | bwd_allreduce_microstep: 48.18 | step_microstep: 182.58 [2024-07-31 17:58:50,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28819.68 | bwd: 40951.67 | bwd_inner: 40183.65 | bwd_allreduce: 767.53 | step: 183.27 65%|██████▌ | 802/1230 [15:46:56<12:40:45, 106.65s/it] {'loss': 1.1566, 'learning_rate': 5.70697509233871e-06, 'epoch': 0.65} 65%|██████▌ | 802/1230 [15:46:56<12:40:45, 106.65s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3989 [2024-07-31 17:58:59,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.83 | bwd_microstep: 5318.97 | bwd_inner_microstep: 5277.60 | bwd_allreduce_microstep: 41.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3818 [2024-07-31 17:59:08,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.36 | bwd_microstep: 5031.80 | bwd_inner_microstep: 5012.49 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 17:59:17,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.20 | bwd_microstep: 5165.53 | bwd_inner_microstep: 5106.09 | bwd_allreduce_microstep: 59.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 17:59:25,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.16 | bwd_microstep: 5196.17 | bwd_inner_microstep: 5111.32 | bwd_allreduce_microstep: 84.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-31 17:59:34,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.27 | bwd_microstep: 5092.42 | bwd_inner_microstep: 5025.31 | bwd_allreduce_microstep: 67.05 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3202 [2024-07-31 17:59:43,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.61 | bwd_microstep: 5011.29 | bwd_inner_microstep: 4846.80 | bwd_allreduce_microstep: 164.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 17:59:51,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3233.11 | bwd_microstep: 4874.49 | bwd_inner_microstep: 4829.75 | bwd_allreduce_microstep: 44.67 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3681 [2024-07-31 17:59:59,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.73 [2024-07-31 17:59:59,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3096.91 | bwd_microstep: 4835.30 | bwd_inner_microstep: 4799.16 | bwd_allreduce_microstep: 36.07 | step_microstep: 181.60 [2024-07-31 17:59:59,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28152.36 | bwd: 40525.96 | bwd_inner: 40008.45 | bwd_allreduce: 517.03 | step: 182.19 65%|██████▌ | 803/1230 [15:48:05<11:18:38, 95.36s/it] {'loss': 1.1454, 'learning_rate': 5.683206574665169e-06, 'epoch': 0.65} 65%|██████▌ | 803/1230 [15:48:05<11:18:38, 95.36s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 18:00:08,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.75 | bwd_microstep: 5300.47 | bwd_inner_microstep: 5275.12 | bwd_allreduce_microstep: 25.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3810 [2024-07-31 18:00:17,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.74 | bwd_microstep: 5149.44 | bwd_inner_microstep: 5118.88 | bwd_allreduce_microstep: 30.49 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2136 [2024-07-31 18:00:25,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3200.48 | bwd_microstep: 4961.68 | bwd_inner_microstep: 4576.90 | bwd_allreduce_microstep: 384.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-07-31 18:00:33,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.54 | bwd_microstep: 4895.66 | bwd_inner_microstep: 4866.82 | bwd_allreduce_microstep: 28.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 18:00:42,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.93 | bwd_microstep: 4979.94 | bwd_inner_microstep: 4944.28 | bwd_allreduce_microstep: 35.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-31 18:00:51,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.91 | bwd_microstep: 5249.97 | bwd_inner_microstep: 4843.89 | bwd_allreduce_microstep: 406.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 18:00:59,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.48 | bwd_microstep: 4907.84 | bwd_inner_microstep: 4883.62 | bwd_allreduce_microstep: 24.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 18:01:08,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 18:01:08,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3194.71 | bwd_microstep: 4688.72 | bwd_inner_microstep: 4666.90 | bwd_allreduce_microstep: 21.75 | step_microstep: 182.57 [2024-07-31 18:01:08,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28178.46 | bwd: 40133.71 | bwd_inner: 39176.35 | bwd_allreduce: 956.88 | step: 183.15 65%|██████▌ | 804/1230 [15:49:13<10:20:08, 87.34s/it] {'loss': 1.1345, 'learning_rate': 5.6594679920337514e-06, 'epoch': 0.65} 65%|██████▌ | 804/1230 [15:49:13<10:20:08, 87.34s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2570 [2024-07-31 18:01:16,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3435.73 | bwd_microstep: 5269.53 | bwd_inner_microstep: 4865.80 | bwd_allreduce_microstep: 403.66 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 18:01:24,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.37 | bwd_microstep: 4846.06 | bwd_inner_microstep: 4826.63 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3762 [2024-07-31 18:01:33,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.06 | bwd_microstep: 5001.41 | bwd_inner_microstep: 4982.04 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-31 18:01:42,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.17 | bwd_microstep: 5151.41 | bwd_inner_microstep: 5097.31 | bwd_allreduce_microstep: 54.03 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3615 [2024-07-31 18:01:51,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.43 | bwd_microstep: 5110.43 | bwd_inner_microstep: 5014.23 | bwd_allreduce_microstep: 96.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 18:01:59,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.62 | bwd_microstep: 4894.00 | bwd_inner_microstep: 4874.66 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 18:02:08,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.86 | bwd_microstep: 4919.02 | bwd_inner_microstep: 4894.11 | bwd_allreduce_microstep: 24.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 18:02:17,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.85 [2024-07-31 18:02:17,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.19 | bwd_microstep: 4918.57 | bwd_inner_microstep: 4894.86 | bwd_allreduce_microstep: 23.65 | step_microstep: 182.25 [2024-07-31 18:02:17,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28686.34 | bwd: 40110.43 | bwd_inner: 39449.60 | bwd_allreduce: 660.33 | step: 182.95 65%|██████▌ | 805/1230 [15:50:23<9:39:59, 81.88s/it] {'loss': 1.1423, 'learning_rate': 5.635759509060969e-06, 'epoch': 0.65} 65%|██████▌ | 805/1230 [15:50:23<9:39:59, 81.88s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 18:02:26,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3851.69 | bwd_microstep: 5341.42 | bwd_inner_microstep: 5322.37 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3839 [2024-07-31 18:02:35,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.87 | bwd_microstep: 5303.31 | bwd_inner_microstep: 5238.63 | bwd_allreduce_microstep: 64.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 18:02:44,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.98 | bwd_microstep: 5143.99 | bwd_inner_microstep: 5065.99 | bwd_allreduce_microstep: 77.94 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2074 [2024-07-31 18:02:52,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.51 | bwd_microstep: 5077.85 | bwd_inner_microstep: 4682.88 | bwd_allreduce_microstep: 394.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3829 [2024-07-31 18:03:01,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.70 | bwd_microstep: 5047.49 | bwd_inner_microstep: 5028.14 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-31 18:03:09,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3174.92 | bwd_microstep: 4681.80 | bwd_inner_microstep: 4658.35 | bwd_allreduce_microstep: 23.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3834 [2024-07-31 18:03:18,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.90 | bwd_microstep: 5092.71 | bwd_inner_microstep: 5054.11 | bwd_allreduce_microstep: 38.53 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3622 [2024-07-31 18:03:26,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 18:03:26,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.64 | bwd_microstep: 5053.01 | bwd_inner_microstep: 4970.39 | bwd_allreduce_microstep: 82.55 | step_microstep: 181.74 [2024-07-31 18:03:26,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28702.12 | bwd: 40741.56 | bwd_inner: 40020.79 | bwd_allreduce: 720.27 | step: 182.33 66%|██████▌ | 806/1230 [15:51:32<9:12:59, 78.25s/it] {'loss': 1.096, 'learning_rate': 5.612081290154607e-06, 'epoch': 0.66} 66%|██████▌ | 806/1230 [15:51:32<9:12:59, 78.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3987 [2024-07-31 18:03:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.28 | bwd_microstep: 5282.61 | bwd_inner_microstep: 5244.65 | bwd_allreduce_microstep: 37.89 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3800 [2024-07-31 18:03:44,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.58 | bwd_microstep: 5152.23 | bwd_inner_microstep: 5079.22 | bwd_allreduce_microstep: 72.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3829 [2024-07-31 18:03:53,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.64 | bwd_microstep: 5061.44 | bwd_inner_microstep: 5042.11 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 18:04:02,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.41 | bwd_microstep: 5147.74 | bwd_inner_microstep: 5095.87 | bwd_allreduce_microstep: 51.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 18:04:11,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.22 | bwd_microstep: 4985.51 | bwd_inner_microstep: 4966.18 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-31 18:04:19,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.91 | bwd_microstep: 5218.78 | bwd_inner_microstep: 4812.73 | bwd_allreduce_microstep: 405.99 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 18:04:28,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.78 | bwd_microstep: 5248.57 | bwd_inner_microstep: 5152.06 | bwd_allreduce_microstep: 96.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 18:04:37,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 18:04:37,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.19 | bwd_microstep: 5102.88 | bwd_inner_microstep: 5028.30 | bwd_allreduce_microstep: 74.51 | step_microstep: 181.75 [2024-07-31 18:04:37,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29351.92 | bwd: 41199.75 | bwd_inner: 40421.07 | bwd_allreduce: 778.18 | step: 182.43 66%|██████▌ | 807/1230 [15:52:43<8:56:19, 76.07s/it] {'loss': 1.1761, 'learning_rate': 5.588433499512576e-06, 'epoch': 0.66} 66%|██████▌ | 807/1230 [15:52:43<8:56:19, 76.07s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3812 [2024-07-31 18:04:46,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.90 | bwd_microstep: 5224.93 | bwd_inner_microstep: 5188.42 | bwd_allreduce_microstep: 36.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2241 [2024-07-31 18:04:55,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3296.70 | bwd_microstep: 5147.77 | bwd_inner_microstep: 4749.31 | bwd_allreduce_microstep: 398.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-07-31 18:05:04,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.02 | bwd_microstep: 5193.55 | bwd_inner_microstep: 5138.91 | bwd_allreduce_microstep: 54.58 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3766 [2024-07-31 18:05:13,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.69 | bwd_microstep: 5185.98 | bwd_inner_microstep: 5109.97 | bwd_allreduce_microstep: 75.94 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-31 18:05:21,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.26 | bwd_microstep: 5002.70 | bwd_inner_microstep: 4983.38 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3757 [2024-07-31 18:05:30,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.72 | bwd_microstep: 5029.78 | bwd_inner_microstep: 5005.76 | bwd_allreduce_microstep: 23.95 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3765 [2024-07-31 18:05:39,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.11 | bwd_microstep: 5005.84 | bwd_inner_microstep: 4986.44 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 18:05:48,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-31 18:05:48,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.68 | bwd_microstep: 4922.48 | bwd_inner_microstep: 4899.25 | bwd_allreduce_microstep: 23.16 | step_microstep: 182.20 [2024-07-31 18:05:48,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29265.99 | bwd: 40713.01 | bwd_inner: 40061.39 | bwd_allreduce: 651.11 | step: 182.81 66%|██████▌ | 808/1230 [15:53:54<8:42:53, 74.35s/it] {'loss': 1.1247, 'learning_rate': 5.564816301121792e-06, 'epoch': 0.66} 66%|██████▌ | 808/1230 [15:53:54<8:42:53, 74.35s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 18:05:57,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.14 | bwd_microstep: 5233.38 | bwd_inner_microstep: 5150.65 | bwd_allreduce_microstep: 82.66 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3863 [2024-07-31 18:06:06,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.01 | bwd_microstep: 5115.94 | bwd_inner_microstep: 5096.58 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 18:06:15,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.34 | bwd_microstep: 5347.93 | bwd_inner_microstep: 5269.11 | bwd_allreduce_microstep: 78.74 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3769 [2024-07-31 18:06:23,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.10 | bwd_microstep: 5009.68 | bwd_inner_microstep: 4990.25 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 18:06:32,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.16 | bwd_microstep: 5229.92 | bwd_inner_microstep: 5164.84 | bwd_allreduce_microstep: 65.01 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3739 [2024-07-31 18:06:41,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.93 | bwd_microstep: 5204.24 | bwd_inner_microstep: 5161.60 | bwd_allreduce_microstep: 42.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 18:06:50,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.67 | bwd_microstep: 5044.28 | bwd_inner_microstep: 4986.86 | bwd_allreduce_microstep: 57.35 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2171 [2024-07-31 18:06:58,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 18:06:58,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.82 | bwd_microstep: 5088.72 | bwd_inner_microstep: 4693.97 | bwd_allreduce_microstep: 394.68 | step_microstep: 181.53 [2024-07-31 18:06:58,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29152.06 | bwd: 41274.08 | bwd_inner: 40513.80 | bwd_allreduce: 759.78 | step: 182.23 66%|██████▌ | 809/1230 [15:55:04<8:34:06, 73.27s/it] {'loss': 1.168, 'learning_rate': 5.541229858757011e-06, 'epoch': 0.66} 66%|██████▌ | 809/1230 [15:55:04<8:34:06, 73.27s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 4009 [2024-07-31 18:07:08,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.00 | bwd_microstep: 5590.89 | bwd_inner_microstep: 5501.52 | bwd_allreduce_microstep: 89.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3793 [2024-07-31 18:07:17,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.53 | bwd_microstep: 5137.80 | bwd_inner_microstep: 5103.08 | bwd_allreduce_microstep: 34.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2267 [2024-07-31 18:07:25,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.97 | bwd_microstep: 5179.55 | bwd_inner_microstep: 4775.53 | bwd_allreduce_microstep: 403.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-31 18:07:34,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.51 | bwd_microstep: 5096.94 | bwd_inner_microstep: 5053.43 | bwd_allreduce_microstep: 43.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-31 18:07:43,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.57 | bwd_microstep: 5230.21 | bwd_inner_microstep: 5143.20 | bwd_allreduce_microstep: 86.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 18:07:51,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3233.57 | bwd_microstep: 4719.72 | bwd_inner_microstep: 4694.80 | bwd_allreduce_microstep: 24.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 18:07:59,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.59 | bwd_microstep: 4700.81 | bwd_inner_microstep: 4678.82 | bwd_allreduce_microstep: 21.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 18:08:08,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 18:08:08,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.57 | bwd_microstep: 5234.92 | bwd_inner_microstep: 5120.18 | bwd_allreduce_microstep: 114.67 | step_microstep: 182.19 [2024-07-31 18:08:08,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28319.20 | bwd: 40890.82 | bwd_inner: 40070.49 | bwd_allreduce: 819.86 | step: 182.76 66%|██████▌ | 810/1230 [15:56:14<8:25:03, 72.15s/it] {'loss': 1.1658, 'learning_rate': 5.517674335979721e-06, 'epoch': 0.66} 66%|██████▌ | 810/1230 [15:56:14<8:25:03, 72.15s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2382 [2024-07-31 18:08:17,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.13 | bwd_microstep: 5397.56 | bwd_inner_microstep: 4981.16 | bwd_allreduce_microstep: 416.33 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3885 [2024-07-31 18:08:26,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.11 | bwd_microstep: 5009.26 | bwd_inner_microstep: 4989.88 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-07-31 18:08:35,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.82 | bwd_microstep: 5294.04 | bwd_inner_microstep: 5225.10 | bwd_allreduce_microstep: 68.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3611 [2024-07-31 18:08:43,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.88 | bwd_microstep: 5128.61 | bwd_inner_microstep: 5037.38 | bwd_allreduce_microstep: 91.16 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-31 18:08:52,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.70 | bwd_microstep: 5234.36 | bwd_inner_microstep: 5170.15 | bwd_allreduce_microstep: 64.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-31 18:09:01,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.01 | bwd_microstep: 5250.56 | bwd_inner_microstep: 4841.00 | bwd_allreduce_microstep: 409.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3712 [2024-07-31 18:09:10,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.79 | bwd_microstep: 5054.67 | bwd_inner_microstep: 4984.00 | bwd_allreduce_microstep: 70.60 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 18:09:19,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 18:09:19,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.14 | bwd_microstep: 5111.58 | bwd_inner_microstep: 5034.88 | bwd_allreduce_microstep: 76.64 | step_microstep: 182.06 [2024-07-31 18:09:19,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28682.46 | bwd: 41480.63 | bwd_inner: 40263.49 | bwd_allreduce: 1216.64 | step: 182.78 66%|██████▌ | 811/1230 [15:57:24<8:20:23, 71.65s/it] {'loss': 1.126, 'learning_rate': 5.494149896136998e-06, 'epoch': 0.66} 66%|██████▌ | 811/1230 [15:57:24<8:20:23, 71.65s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3938 [2024-07-31 18:09:28,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.83 | bwd_microstep: 5318.68 | bwd_inner_microstep: 5270.59 | bwd_allreduce_microstep: 48.02 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3862 [2024-07-31 18:09:36,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.59 | bwd_microstep: 5244.31 | bwd_inner_microstep: 5165.93 | bwd_allreduce_microstep: 78.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3767 [2024-07-31 18:09:45,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.94 | bwd_microstep: 5005.85 | bwd_inner_microstep: 4985.12 | bwd_allreduce_microstep: 20.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 18:09:54,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.94 | bwd_microstep: 5121.14 | bwd_inner_microstep: 5086.32 | bwd_allreduce_microstep: 34.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 18:10:03,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.62 | bwd_microstep: 5022.52 | bwd_inner_microstep: 4995.79 | bwd_allreduce_microstep: 26.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2220 [2024-07-31 18:10:12,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.27 | bwd_microstep: 5190.87 | bwd_inner_microstep: 4789.70 | bwd_allreduce_microstep: 401.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-31 18:10:20,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.83 | bwd_microstep: 5003.55 | bwd_inner_microstep: 4984.24 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 18:10:29,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-31 18:10:29,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.84 | bwd_microstep: 5074.43 | bwd_inner_microstep: 5011.55 | bwd_allreduce_microstep: 62.82 | step_microstep: 182.13 [2024-07-31 18:10:29,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29422.76 | bwd: 40981.34 | bwd_inner: 40289.19 | bwd_allreduce: 691.66 | step: 182.73 66%|██████▌ | 812/1230 [15:58:35<8:17:16, 71.38s/it] {'loss': 1.1915, 'learning_rate': 5.470656702360363e-06, 'epoch': 0.66} 66%|██████▌ | 812/1230 [15:58:35<8:17:16, 71.38s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4021 [2024-07-31 18:10:39,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3879.40 | bwd_microstep: 5407.58 | bwd_inner_microstep: 5370.24 | bwd_allreduce_microstep: 37.27 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3843 [2024-07-31 18:10:47,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.26 | bwd_microstep: 5226.72 | bwd_inner_microstep: 5155.32 | bwd_allreduce_microstep: 71.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3814 [2024-07-31 18:10:56,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3792.62 | bwd_microstep: 5172.53 | bwd_inner_microstep: 5134.04 | bwd_allreduce_microstep: 38.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 18:11:05,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.38 | bwd_microstep: 5097.59 | bwd_inner_microstep: 5027.07 | bwd_allreduce_microstep: 70.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-31 18:11:14,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.45 | bwd_microstep: 4966.96 | bwd_inner_microstep: 4937.06 | bwd_allreduce_microstep: 29.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 18:11:23,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.25 | bwd_microstep: 4973.07 | bwd_inner_microstep: 4953.84 | bwd_allreduce_microstep: 19.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 18:11:31,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.27 | bwd_microstep: 5194.41 | bwd_inner_microstep: 5111.61 | bwd_allreduce_microstep: 82.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-31 18:11:40,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 18:11:40,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3228.73 | bwd_microstep: 4788.73 | bwd_inner_microstep: 4758.48 | bwd_allreduce_microstep: 30.18 | step_microstep: 183.02 [2024-07-31 18:11:40,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29192.26 | bwd: 40827.58 | bwd_inner: 40447.61 | bwd_allreduce: 379.48 | step: 183.59 66%|██████▌ | 813/1230 [15:59:46<8:13:57, 71.07s/it] {'loss': 1.1914, 'learning_rate': 5.447194917564671e-06, 'epoch': 0.66} 66%|██████▌ | 813/1230 [15:59:46<8:13:57, 71.07s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3898 [2024-07-31 18:11:49,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.51 | bwd_microstep: 5384.33 | bwd_inner_microstep: 5319.35 | bwd_allreduce_microstep: 64.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3829 [2024-07-31 18:11:58,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.09 | bwd_microstep: 5062.55 | bwd_inner_microstep: 5037.59 | bwd_allreduce_microstep: 24.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3589 [2024-07-31 18:12:06,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.34 | bwd_microstep: 5241.91 | bwd_inner_microstep: 5135.83 | bwd_allreduce_microstep: 106.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 18:12:15,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.33 | bwd_microstep: 5097.38 | bwd_inner_microstep: 5052.15 | bwd_allreduce_microstep: 45.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 18:12:24,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.15 | bwd_microstep: 5159.46 | bwd_inner_microstep: 5082.69 | bwd_allreduce_microstep: 76.70 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 18:12:33,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.82 | bwd_microstep: 4912.10 | bwd_inner_microstep: 4890.92 | bwd_allreduce_microstep: 21.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 18:12:41,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.89 | bwd_microstep: 5187.95 | bwd_inner_microstep: 5111.32 | bwd_allreduce_microstep: 76.57 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 18:12:50,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 18:12:50,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.62 | bwd_microstep: 5155.95 | bwd_inner_microstep: 5085.23 | bwd_allreduce_microstep: 70.65 | step_microstep: 181.40 [2024-07-31 18:12:50,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29173.66 | bwd: 41201.63 | bwd_inner: 40715.01 | bwd_allreduce: 486.12 | step: 182.09 66%|██████▌ | 814/1230 [16:00:56<8:12:01, 70.96s/it] {'loss': 1.1638, 'learning_rate': 5.423764704446954e-06, 'epoch': 0.66} 66%|██████▌ | 814/1230 [16:00:56<8:12:01, 70.96s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3540 [2024-07-31 18:12:59,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.78 | bwd_microstep: 5408.23 | bwd_inner_microstep: 5245.77 | bwd_allreduce_microstep: 162.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3883 [2024-07-31 18:13:08,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.15 | bwd_microstep: 5353.41 | bwd_inner_microstep: 5286.41 | bwd_allreduce_microstep: 66.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-07-31 18:13:17,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.33 | bwd_microstep: 5212.45 | bwd_inner_microstep: 5128.39 | bwd_allreduce_microstep: 84.00 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2841 [2024-07-31 18:13:26,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.06 | bwd_microstep: 5149.06 | bwd_inner_microstep: 4747.91 | bwd_allreduce_microstep: 401.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 18:13:34,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3227.37 | bwd_microstep: 4859.35 | bwd_inner_microstep: 4813.21 | bwd_allreduce_microstep: 46.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-31 18:13:43,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.14 | bwd_microstep: 5000.92 | bwd_inner_microstep: 4981.47 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 18:13:52,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.36 | bwd_microstep: 5215.50 | bwd_inner_microstep: 5131.83 | bwd_allreduce_microstep: 83.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 18:14:01,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.88 [2024-07-31 18:14:01,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.96 | bwd_microstep: 4898.34 | bwd_inner_microstep: 4878.94 | bwd_allreduce_microstep: 19.32 | step_microstep: 182.51 [2024-07-31 18:14:01,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28757.06 | bwd: 41097.23 | bwd_inner: 40213.88 | bwd_allreduce: 882.88 | step: 183.10 66%|██████▋ | 815/1230 [16:02:06<8:09:13, 70.73s/it] {'loss': 1.1167, 'learning_rate': 5.400366225485322e-06, 'epoch': 0.66} 66%|██████▋ | 815/1230 [16:02:06<8:09:13, 70.73s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3532 [2024-07-31 18:14:09,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.18 | bwd_microstep: 5211.66 | bwd_inner_microstep: 5121.09 | bwd_allreduce_microstep: 90.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-31 18:14:18,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.33 | bwd_microstep: 5167.49 | bwd_inner_microstep: 5130.90 | bwd_allreduce_microstep: 36.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3904 [2024-07-31 18:14:27,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.65 | bwd_microstep: 5132.19 | bwd_inner_microstep: 5112.84 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3762 [2024-07-31 18:14:36,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.53 | bwd_microstep: 5014.32 | bwd_inner_microstep: 4994.95 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 18:14:45,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.09 | bwd_microstep: 5161.32 | bwd_inner_microstep: 5082.35 | bwd_allreduce_microstep: 78.91 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3657 [2024-07-31 18:14:54,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.83 | bwd_microstep: 5116.98 | bwd_inner_microstep: 5038.42 | bwd_allreduce_microstep: 78.49 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3693 [2024-07-31 18:15:02,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.49 | bwd_microstep: 5073.51 | bwd_inner_microstep: 5001.49 | bwd_allreduce_microstep: 71.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3711 [2024-07-31 18:15:11,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 18:15:11,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.89 | bwd_microstep: 4978.49 | bwd_inner_microstep: 4918.13 | bwd_allreduce_microstep: 60.30 | step_microstep: 181.26 [2024-07-31 18:15:11,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29233.92 | bwd: 40855.94 | bwd_inner: 40400.10 | bwd_allreduce: 455.35 | step: 181.83 66%|██████▋ | 816/1230 [16:03:17<8:07:24, 70.64s/it] {'loss': 1.153, 'learning_rate': 5.376999642937817e-06, 'epoch': 0.66} 66%|██████▋ | 816/1230 [16:03:17<8:07:24, 70.64s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2018 [2024-07-31 18:15:20,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.78 | bwd_microstep: 5413.88 | bwd_inner_microstep: 4998.85 | bwd_allreduce_microstep: 414.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3863 [2024-07-31 18:15:29,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.29 | bwd_microstep: 5108.27 | bwd_inner_microstep: 5088.97 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3926 [2024-07-31 18:15:38,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.00 | bwd_microstep: 5040.27 | bwd_inner_microstep: 5017.13 | bwd_allreduce_microstep: 23.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-31 18:15:46,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.12 | bwd_microstep: 5194.05 | bwd_inner_microstep: 5116.39 | bwd_allreduce_microstep: 77.60 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2163 [2024-07-31 18:15:55,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.63 | bwd_microstep: 5082.15 | bwd_inner_microstep: 4687.37 | bwd_allreduce_microstep: 394.70 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2153 [2024-07-31 18:16:04,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3457.29 | bwd_microstep: 5048.23 | bwd_inner_microstep: 4657.89 | bwd_allreduce_microstep: 390.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 18:16:12,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.40 | bwd_microstep: 4890.41 | bwd_inner_microstep: 4871.00 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3659 [2024-07-31 18:16:21,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.88 [2024-07-31 18:16:21,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.08 | bwd_microstep: 4934.58 | bwd_inner_microstep: 4909.86 | bwd_allreduce_microstep: 24.65 | step_microstep: 181.91 [2024-07-31 18:16:21,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28972.50 | bwd: 40711.81 | bwd_inner: 39347.41 | bwd_allreduce: 1363.92 | step: 182.50 66%|██████▋ | 817/1230 [16:04:27<8:04:56, 70.45s/it] {'loss': 1.151, 'learning_rate': 5.3536651188413e-06, 'epoch': 0.66} 66%|██████▋ | 817/1230 [16:04:27<8:04:56, 70.45s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2169 [2024-07-31 18:16:30,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.83 | bwd_microstep: 5537.83 | bwd_inner_microstep: 5110.08 | bwd_allreduce_microstep: 427.69 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2054 [2024-07-31 18:16:39,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.00 | bwd_microstep: 5262.21 | bwd_inner_microstep: 4850.96 | bwd_allreduce_microstep: 411.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 18:16:47,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3032.62 | bwd_microstep: 4982.78 | bwd_inner_microstep: 4599.94 | bwd_allreduce_microstep: 382.78 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 18:16:56,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.28 | bwd_microstep: 5165.28 | bwd_inner_microstep: 4763.45 | bwd_allreduce_microstep: 401.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 18:17:05,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.79 | bwd_microstep: 5186.86 | bwd_inner_microstep: 5108.52 | bwd_allreduce_microstep: 78.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 18:17:13,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.65 | bwd_microstep: 5228.63 | bwd_inner_microstep: 4820.30 | bwd_allreduce_microstep: 408.27 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3832 [2024-07-31 18:17:22,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.65 | bwd_microstep: 5065.61 | bwd_inner_microstep: 5046.19 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 18:17:31,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 18:17:31,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.93 | bwd_microstep: 5065.20 | bwd_inner_microstep: 5004.56 | bwd_allreduce_microstep: 60.57 | step_microstep: 182.28 [2024-07-31 18:17:31,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28318.67 | bwd: 41494.39 | bwd_inner: 39303.92 | bwd_allreduce: 2189.97 | step: 182.98 67%|██████▋ | 818/1230 [16:05:37<8:03:07, 70.36s/it] {'loss': 1.1679, 'learning_rate': 5.330362815010302e-06, 'epoch': 0.66} 67%|██████▋ | 818/1230 [16:05:37<8:03:07, 70.36s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3693 [2024-07-31 18:17:40,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.36 | bwd_microstep: 5586.87 | bwd_inner_microstep: 5424.95 | bwd_allreduce_microstep: 161.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3817 [2024-07-31 18:17:49,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.79 | bwd_microstep: 5029.72 | bwd_inner_microstep: 5010.42 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 18:17:57,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.96 | bwd_microstep: 4902.86 | bwd_inner_microstep: 4849.49 | bwd_allreduce_microstep: 53.30 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3618 [2024-07-31 18:18:06,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.37 | bwd_microstep: 5011.60 | bwd_inner_microstep: 4942.12 | bwd_allreduce_microstep: 69.41 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2102 [2024-07-31 18:18:15,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.63 | bwd_microstep: 5157.65 | bwd_inner_microstep: 4756.53 | bwd_allreduce_microstep: 401.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 18:18:23,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.51 | bwd_microstep: 4923.60 | bwd_inner_microstep: 4903.08 | bwd_allreduce_microstep: 20.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-07-31 18:18:32,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.35 | bwd_microstep: 5001.91 | bwd_inner_microstep: 4982.63 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-31 18:18:41,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 18:18:41,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3446.59 | bwd_microstep: 5009.76 | bwd_inner_microstep: 4623.37 | bwd_allreduce_microstep: 386.32 | step_microstep: 181.80 [2024-07-31 18:18:41,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28663.46 | bwd: 40623.94 | bwd_inner: 39492.55 | bwd_allreduce: 1130.90 | step: 182.40 67%|██████▋ | 819/1230 [16:06:47<8:00:26, 70.14s/it] {'loss': 1.1837, 'learning_rate': 5.307092893035951e-06, 'epoch': 0.67} 67%|██████▋ | 819/1230 [16:06:47<8:00:26, 70.14s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 18:18:50,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3897.05 | bwd_microstep: 5378.49 | bwd_inner_microstep: 5355.39 | bwd_allreduce_microstep: 23.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3794 [2024-07-31 18:18:59,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.60 | bwd_microstep: 5023.21 | bwd_inner_microstep: 5003.86 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3871 [2024-07-31 18:19:08,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.48 | bwd_microstep: 5225.12 | bwd_inner_microstep: 5175.38 | bwd_allreduce_microstep: 49.68 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3609 [2024-07-31 18:19:16,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3100.53 | bwd_microstep: 4953.04 | bwd_inner_microstep: 4889.86 | bwd_allreduce_microstep: 63.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-31 18:19:24,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3350.20 | bwd_microstep: 4811.45 | bwd_inner_microstep: 4790.92 | bwd_allreduce_microstep: 20.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 18:19:32,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.65 | bwd_microstep: 4857.00 | bwd_inner_microstep: 4817.86 | bwd_allreduce_microstep: 39.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-07-31 18:19:41,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.42 | bwd_microstep: 5012.81 | bwd_inner_microstep: 4993.42 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 18:19:50,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 18:19:50,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.96 | bwd_microstep: 4987.24 | bwd_inner_microstep: 4934.07 | bwd_allreduce_microstep: 53.10 | step_microstep: 182.66 [2024-07-31 18:19:50,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28234.80 | bwd: 40248.36 | bwd_inner: 39960.70 | bwd_allreduce: 287.18 | step: 183.23 67%|██████▋ | 820/1230 [16:07:55<7:56:33, 69.74s/it] {'loss': 1.1779, 'learning_rate': 5.2838555142847925e-06, 'epoch': 0.67} 67%|██████▋ | 820/1230 [16:07:55<7:56:33, 69.74s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4052 [2024-07-31 18:19:59,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3889.69 | bwd_microstep: 5494.77 | bwd_inner_microstep: 5455.07 | bwd_allreduce_microstep: 39.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2386 [2024-07-31 18:20:08,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.82 | bwd_microstep: 5330.74 | bwd_inner_microstep: 4916.95 | bwd_allreduce_microstep: 413.72 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3695 [2024-07-31 18:20:17,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.44 | bwd_microstep: 5144.43 | bwd_inner_microstep: 5061.64 | bwd_allreduce_microstep: 82.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3801 [2024-07-31 18:20:25,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.97 | bwd_microstep: 5034.19 | bwd_inner_microstep: 5014.85 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2180 [2024-07-31 18:20:34,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.96 | bwd_microstep: 5254.28 | bwd_inner_microstep: 4845.09 | bwd_allreduce_microstep: 409.12 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3646 [2024-07-31 18:20:43,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.56 | bwd_microstep: 5102.45 | bwd_inner_microstep: 5015.26 | bwd_allreduce_microstep: 87.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-31 18:20:52,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.24 | bwd_microstep: 4990.91 | bwd_inner_microstep: 4971.61 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 18:21:01,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 18:21:01,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.14 | bwd_microstep: 5196.64 | bwd_inner_microstep: 5114.27 | bwd_allreduce_microstep: 82.31 | step_microstep: 181.40 [2024-07-31 18:21:01,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29345.73 | bwd: 41548.39 | bwd_inner: 40394.68 | bwd_allreduce: 1153.21 | step: 181.98 67%|██████▋ | 821/1230 [16:09:07<7:58:26, 70.19s/it] {'loss': 1.1897, 'learning_rate': 5.260650839897719e-06, 'epoch': 0.67} 67%|██████▋ | 821/1230 [16:09:07<7:58:26, 70.19s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4012 [2024-07-31 18:21:10,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3840.97 | bwd_microstep: 5281.55 | bwd_inner_microstep: 5262.51 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-31 18:21:18,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3158.59 | bwd_microstep: 5314.42 | bwd_inner_microstep: 4907.84 | bwd_allreduce_microstep: 406.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-31 18:21:27,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.24 | bwd_microstep: 5207.53 | bwd_inner_microstep: 5143.27 | bwd_allreduce_microstep: 64.20 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 18:21:36,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.14 | bwd_microstep: 5132.82 | bwd_inner_microstep: 5081.42 | bwd_allreduce_microstep: 51.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-31 18:21:45,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.38 | bwd_microstep: 4915.92 | bwd_inner_microstep: 4893.54 | bwd_allreduce_microstep: 22.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 18:21:53,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.65 | bwd_microstep: 5005.79 | bwd_inner_microstep: 4953.34 | bwd_allreduce_microstep: 52.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 18:22:02,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.28 | bwd_microstep: 5226.97 | bwd_inner_microstep: 5112.22 | bwd_allreduce_microstep: 114.68 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 18:22:11,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-31 18:22:11,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.25 | bwd_microstep: 5021.41 | bwd_inner_microstep: 4963.84 | bwd_allreduce_microstep: 57.51 | step_microstep: 181.90 [2024-07-31 18:22:11,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28583.39 | bwd: 41106.39 | bwd_inner: 40317.92 | bwd_allreduce: 787.98 | step: 182.51 67%|██████▋ | 822/1230 [16:10:17<7:56:56, 70.14s/it] {'loss': 1.1088, 'learning_rate': 5.237479030788813e-06, 'epoch': 0.67} 67%|██████▋ | 822/1230 [16:10:17<7:56:56, 70.14s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4068 [2024-07-31 18:22:20,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.15 | bwd_microstep: 5255.77 | bwd_inner_microstep: 5226.62 | bwd_allreduce_microstep: 29.07 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3820 [2024-07-31 18:22:29,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.56 | bwd_microstep: 5292.44 | bwd_inner_microstep: 5226.87 | bwd_allreduce_microstep: 65.50 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3814 [2024-07-31 18:22:38,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.47 | bwd_microstep: 5043.22 | bwd_inner_microstep: 5023.85 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-31 18:22:46,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.79 | bwd_microstep: 5205.65 | bwd_inner_microstep: 5124.68 | bwd_allreduce_microstep: 80.91 | step_microstep: 0.19 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3707 [2024-07-31 18:22:55,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3421.79 | bwd_microstep: 4786.62 | bwd_inner_microstep: 4767.25 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2127 [2024-07-31 18:23:03,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.57 | bwd_microstep: 5243.67 | bwd_inner_microstep: 4835.92 | bwd_allreduce_microstep: 407.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-31 18:23:12,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.28 | bwd_microstep: 4982.40 | bwd_inner_microstep: 4945.19 | bwd_allreduce_microstep: 37.14 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3593 [2024-07-31 18:23:21,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 18:23:21,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.48 | bwd_microstep: 4970.19 | bwd_inner_microstep: 4924.59 | bwd_allreduce_microstep: 45.53 | step_microstep: 181.47 [2024-07-31 18:23:21,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29076.00 | bwd: 40779.94 | bwd_inner: 40074.92 | bwd_allreduce: 704.54 | step: 182.17 67%|██████▋ | 823/1230 [16:11:27<7:55:52, 70.15s/it] {'loss': 1.1988, 'learning_rate': 5.214340247644278e-06, 'epoch': 0.67} 67%|██████▋ | 823/1230 [16:11:27<7:55:52, 70.15s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 18:23:30,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.27 | bwd_microstep: 5598.03 | bwd_inner_microstep: 5413.14 | bwd_allreduce_microstep: 184.82 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2044 [2024-07-31 18:23:39,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.59 | bwd_microstep: 5295.82 | bwd_inner_microstep: 4886.98 | bwd_allreduce_microstep: 408.77 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2065 [2024-07-31 18:23:48,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.84 | bwd_microstep: 5235.16 | bwd_inner_microstep: 4828.78 | bwd_allreduce_microstep: 406.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-31 18:23:57,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.16 | bwd_microstep: 4986.31 | bwd_inner_microstep: 4965.53 | bwd_allreduce_microstep: 20.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 18:24:05,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.38 | bwd_microstep: 5109.95 | bwd_inner_microstep: 5042.38 | bwd_allreduce_microstep: 67.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 18:24:14,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.95 | bwd_microstep: 5004.31 | bwd_inner_microstep: 4984.92 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 18:24:23,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.41 | bwd_microstep: 5131.82 | bwd_inner_microstep: 4733.86 | bwd_allreduce_microstep: 397.90 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2121 [2024-07-31 18:24:32,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 18:24:32,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.87 | bwd_microstep: 5101.74 | bwd_inner_microstep: 4706.96 | bwd_allreduce_microstep: 394.72 | step_microstep: 181.44 [2024-07-31 18:24:32,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28900.36 | bwd: 41463.11 | bwd_inner: 39562.50 | bwd_allreduce: 1900.12 | step: 182.03 67%|██████▋ | 824/1230 [16:12:38<7:55:47, 70.31s/it] {'loss': 1.1311, 'learning_rate': 5.191234650921273e-06, 'epoch': 0.67} 67%|██████▋ | 824/1230 [16:12:38<7:55:47, 70.31s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3898 [2024-07-31 18:24:41,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3832.14 | bwd_microstep: 5216.12 | bwd_inner_microstep: 5185.15 | bwd_allreduce_microstep: 30.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-31 18:24:49,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3363.68 | bwd_microstep: 5089.10 | bwd_inner_microstep: 5026.92 | bwd_allreduce_microstep: 62.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-07-31 18:24:58,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.79 | bwd_microstep: 5031.08 | bwd_inner_microstep: 4999.51 | bwd_allreduce_microstep: 31.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-31 18:25:07,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.12 | bwd_microstep: 5159.79 | bwd_inner_microstep: 5103.89 | bwd_allreduce_microstep: 55.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-31 18:25:15,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.42 | bwd_microstep: 5106.92 | bwd_inner_microstep: 5063.42 | bwd_allreduce_microstep: 43.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 18:25:23,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.94 | bwd_microstep: 4793.23 | bwd_inner_microstep: 4773.84 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 18:25:32,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.35 | bwd_microstep: 5186.85 | bwd_inner_microstep: 5112.72 | bwd_allreduce_microstep: 74.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 18:25:40,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 18:25:40,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2992.74 | bwd_microstep: 4851.83 | bwd_inner_microstep: 4479.04 | bwd_allreduce_microstep: 372.73 | step_microstep: 181.47 [2024-07-31 18:25:40,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27850.07 | bwd: 40434.91 | bwd_inner: 39744.43 | bwd_allreduce: 690.01 | step: 182.06 67%|██████▋ | 825/1230 [16:13:46<7:51:11, 69.81s/it] {'loss': 1.1758, 'learning_rate': 5.168162400846839e-06, 'epoch': 0.67} 67%|██████▋ | 825/1230 [16:13:46<7:51:11, 69.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2328 [2024-07-31 18:25:49,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.15 | bwd_microstep: 5511.05 | bwd_inner_microstep: 5090.51 | bwd_allreduce_microstep: 420.47 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3667 [2024-07-31 18:25:58,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.98 | bwd_microstep: 5264.20 | bwd_inner_microstep: 5191.62 | bwd_allreduce_microstep: 72.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3779 [2024-07-31 18:26:07,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.70 | bwd_microstep: 5129.81 | bwd_inner_microstep: 5085.05 | bwd_allreduce_microstep: 44.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-31 18:26:16,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.22 | bwd_microstep: 5153.37 | bwd_inner_microstep: 5071.86 | bwd_allreduce_microstep: 81.44 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2180 [2024-07-31 18:26:25,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.47 | bwd_microstep: 5156.97 | bwd_inner_microstep: 4756.45 | bwd_allreduce_microstep: 400.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 18:26:33,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.58 | bwd_microstep: 5021.48 | bwd_inner_microstep: 4964.48 | bwd_allreduce_microstep: 56.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 18:26:42,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.38 | bwd_microstep: 5072.94 | bwd_inner_microstep: 4678.61 | bwd_allreduce_microstep: 394.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-31 18:26:51,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-31 18:26:51,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.10 | bwd_microstep: 5106.56 | bwd_inner_microstep: 4710.63 | bwd_allreduce_microstep: 395.86 | step_microstep: 181.73 [2024-07-31 18:26:51,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28531.45 | bwd: 41416.37 | bwd_inner: 39549.15 | bwd_allreduce: 1866.74 | step: 182.31 67%|██████▋ | 826/1230 [16:14:56<7:50:59, 69.95s/it] {'loss': 1.1591, 'learning_rate': 5.145123657416755e-06, 'epoch': 0.67} 67%|██████▋ | 826/1230 [16:14:56<7:50:59, 69.95s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3934 [2024-07-31 18:27:00,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3849.33 | bwd_microstep: 5470.23 | bwd_inner_microstep: 5410.58 | bwd_allreduce_microstep: 59.59 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3564 [2024-07-31 18:27:09,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.68 | bwd_microstep: 5370.17 | bwd_inner_microstep: 5235.21 | bwd_allreduce_microstep: 134.90 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2061 [2024-07-31 18:27:18,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.29 | bwd_microstep: 5244.01 | bwd_inner_microstep: 4837.59 | bwd_allreduce_microstep: 406.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-31 18:27:27,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.19 | bwd_microstep: 5136.36 | bwd_inner_microstep: 5059.07 | bwd_allreduce_microstep: 77.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-31 18:27:35,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.55 | bwd_microstep: 4944.71 | bwd_inner_microstep: 4914.89 | bwd_allreduce_microstep: 29.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 18:27:43,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3196.83 | bwd_microstep: 4685.15 | bwd_inner_microstep: 4662.15 | bwd_allreduce_microstep: 22.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 18:27:52,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.58 | bwd_microstep: 5157.16 | bwd_inner_microstep: 5081.57 | bwd_allreduce_microstep: 75.52 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-31 18:28:01,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 18:28:01,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.40 | bwd_microstep: 4897.27 | bwd_inner_microstep: 4877.91 | bwd_allreduce_microstep: 19.29 | step_microstep: 182.39 [2024-07-31 18:28:01,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28847.77 | bwd: 40905.03 | bwd_inner: 40078.92 | bwd_allreduce: 825.64 | step: 182.99 67%|██████▋ | 827/1230 [16:16:07<7:50:06, 69.99s/it] {'loss': 1.1308, 'learning_rate': 5.122118580394473e-06, 'epoch': 0.67} 67%|██████▋ | 827/1230 [16:16:07<7:50:06, 69.99s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3340 [2024-07-31 18:28:10,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.82 | bwd_microstep: 5239.98 | bwd_inner_microstep: 5075.42 | bwd_allreduce_microstep: 164.49 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3805 [2024-07-31 18:28:18,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.71 | bwd_microstep: 5164.27 | bwd_inner_microstep: 5118.56 | bwd_allreduce_microstep: 45.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-31 18:28:27,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.75 | bwd_microstep: 5036.84 | bwd_inner_microstep: 5012.13 | bwd_allreduce_microstep: 24.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-07-31 18:28:36,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.06 | bwd_microstep: 5209.71 | bwd_inner_microstep: 4806.06 | bwd_allreduce_microstep: 403.58 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3637 [2024-07-31 18:28:44,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.13 | bwd_microstep: 5002.61 | bwd_inner_microstep: 4927.61 | bwd_allreduce_microstep: 74.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-31 18:28:53,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.97 | bwd_microstep: 5138.77 | bwd_inner_microstep: 5059.53 | bwd_allreduce_microstep: 79.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 18:29:02,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.28 | bwd_microstep: 5115.67 | bwd_inner_microstep: 5053.89 | bwd_allreduce_microstep: 61.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 18:29:11,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 18:29:11,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.75 | bwd_microstep: 5067.59 | bwd_inner_microstep: 5010.39 | bwd_allreduce_microstep: 57.13 | step_microstep: 182.86 [2024-07-31 18:29:11,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28792.37 | bwd: 40975.42 | bwd_inner: 40063.53 | bwd_allreduce: 911.42 | step: 183.56 67%|██████▋ | 828/1230 [16:17:17<7:49:09, 70.02s/it] {'loss': 1.1886, 'learning_rate': 5.099147329309959e-06, 'epoch': 0.67} 67%|██████▋ | 828/1230 [16:17:17<7:49:09, 70.02s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3999 [2024-07-31 18:29:20,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3888.22 | bwd_microstep: 5351.59 | bwd_inner_microstep: 5319.63 | bwd_allreduce_microstep: 31.89 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2284 [2024-07-31 18:29:29,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.10 | bwd_microstep: 5355.72 | bwd_inner_microstep: 4939.72 | bwd_allreduce_microstep: 415.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3900 [2024-07-31 18:29:38,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.71 | bwd_microstep: 4979.02 | bwd_inner_microstep: 4956.87 | bwd_allreduce_microstep: 22.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 18:29:46,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.20 | bwd_microstep: 5135.42 | bwd_inner_microstep: 5065.80 | bwd_allreduce_microstep: 69.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-31 18:29:55,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.81 | bwd_microstep: 5209.98 | bwd_inner_microstep: 4805.65 | bwd_allreduce_microstep: 404.26 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3637 [2024-07-31 18:30:04,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.34 | bwd_microstep: 5142.68 | bwd_inner_microstep: 5043.71 | bwd_allreduce_microstep: 98.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 18:30:13,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.68 | bwd_microstep: 5055.57 | bwd_inner_microstep: 4997.34 | bwd_allreduce_microstep: 58.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-31 18:30:21,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-31 18:30:21,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.43 | bwd_microstep: 5123.86 | bwd_inner_microstep: 4726.04 | bwd_allreduce_microstep: 397.75 | step_microstep: 181.46 [2024-07-31 18:30:21,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28934.40 | bwd: 41353.82 | bwd_inner: 39854.69 | bwd_allreduce: 1498.65 | step: 182.05 67%|██████▋ | 829/1230 [16:18:27<7:49:11, 70.20s/it] {'loss': 1.152, 'learning_rate': 5.076210063458622e-06, 'epoch': 0.67} 67%|██████▋ | 829/1230 [16:18:27<7:49:11, 70.20s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2388 [2024-07-31 18:30:30,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3122.92 | bwd_microstep: 5206.90 | bwd_inner_microstep: 4809.05 | bwd_allreduce_microstep: 397.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3791 [2024-07-31 18:30:39,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.09 | bwd_microstep: 5184.59 | bwd_inner_microstep: 5133.56 | bwd_allreduce_microstep: 50.96 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2861 [2024-07-31 18:30:47,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.32 | bwd_microstep: 5132.15 | bwd_inner_microstep: 4729.11 | bwd_allreduce_microstep: 402.97 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2105 [2024-07-31 18:30:56,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.09 | bwd_microstep: 5226.56 | bwd_inner_microstep: 4822.19 | bwd_allreduce_microstep: 404.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-31 18:31:05,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.09 | bwd_microstep: 5153.83 | bwd_inner_microstep: 5072.22 | bwd_allreduce_microstep: 81.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 18:31:13,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.13 | bwd_microstep: 4923.40 | bwd_inner_microstep: 4896.29 | bwd_allreduce_microstep: 27.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 18:31:22,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.12 | bwd_microstep: 4985.71 | bwd_inner_microstep: 4966.36 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 18:31:31,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 18:31:31,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.48 | bwd_microstep: 5049.12 | bwd_inner_microstep: 4657.29 | bwd_allreduce_microstep: 391.76 | step_microstep: 183.16 [2024-07-31 18:31:31,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28375.14 | bwd: 40862.23 | bwd_inner: 39086.01 | bwd_allreduce: 1775.74 | step: 183.74 67%|██████▋ | 830/1230 [16:19:37<7:46:44, 70.01s/it] {'loss': 1.1644, 'learning_rate': 5.0533069419002e-06, 'epoch': 0.67} 67%|██████▋ | 830/1230 [16:19:37<7:46:44, 70.01s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 18:31:40,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.88 | bwd_microstep: 5294.99 | bwd_inner_microstep: 5267.96 | bwd_allreduce_microstep: 26.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3583 [2024-07-31 18:31:49,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.61 | bwd_microstep: 5275.11 | bwd_inner_microstep: 5175.93 | bwd_allreduce_microstep: 99.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-31 18:31:58,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.61 | bwd_microstep: 5179.71 | bwd_inner_microstep: 5121.45 | bwd_allreduce_microstep: 58.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-31 18:32:07,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.22 | bwd_microstep: 5150.36 | bwd_inner_microstep: 5069.19 | bwd_allreduce_microstep: 81.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-31 18:32:15,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.64 | bwd_microstep: 4976.49 | bwd_inner_microstep: 4957.11 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 18:32:24,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.46 | bwd_microstep: 5187.58 | bwd_inner_microstep: 5112.43 | bwd_allreduce_microstep: 75.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3621 [2024-07-31 18:32:33,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.09 | bwd_microstep: 5122.35 | bwd_inner_microstep: 5025.68 | bwd_allreduce_microstep: 96.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 18:32:42,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 18:32:42,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.33 | bwd_microstep: 5134.55 | bwd_inner_microstep: 5064.54 | bwd_allreduce_microstep: 69.94 | step_microstep: 181.91 [2024-07-31 18:32:42,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29104.75 | bwd: 41321.12 | bwd_inner: 40794.22 | bwd_allreduce: 526.41 | step: 182.49 68%|██████▊ | 831/1230 [16:20:48<7:47:07, 70.24s/it] {'loss': 1.1246, 'learning_rate': 5.030438123457655e-06, 'epoch': 0.68} 68%|██████▊ | 831/1230 [16:20:48<7:47:07, 70.24s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3902 [2024-07-31 18:32:51,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.53 | bwd_microstep: 5596.17 | bwd_inner_microstep: 5497.96 | bwd_allreduce_microstep: 98.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3870 [2024-07-31 18:32:59,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3305.25 | bwd_microstep: 4922.53 | bwd_inner_microstep: 4903.19 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2302 [2024-07-31 18:33:08,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.38 | bwd_microstep: 5508.18 | bwd_inner_microstep: 5084.28 | bwd_allreduce_microstep: 423.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 18:33:17,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.46 | bwd_microstep: 5167.18 | bwd_inner_microstep: 5085.66 | bwd_allreduce_microstep: 81.46 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2236 [2024-07-31 18:33:26,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.66 | bwd_microstep: 5147.35 | bwd_inner_microstep: 4746.81 | bwd_allreduce_microstep: 400.48 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2143 [2024-07-31 18:33:35,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.06 | bwd_microstep: 5201.04 | bwd_inner_microstep: 4797.13 | bwd_allreduce_microstep: 403.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 18:33:44,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.50 | bwd_microstep: 5266.68 | bwd_inner_microstep: 5177.95 | bwd_allreduce_microstep: 88.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-31 18:33:52,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 18:33:52,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.08 | bwd_microstep: 4911.16 | bwd_inner_microstep: 4887.63 | bwd_allreduce_microstep: 23.46 | step_microstep: 182.73 [2024-07-31 18:33:52,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28672.83 | bwd: 41720.27 | bwd_inner: 40180.56 | bwd_allreduce: 1539.23 | step: 183.32 68%|██████▊ | 832/1230 [16:21:58<7:46:53, 70.39s/it] {'loss': 1.1455, 'learning_rate': 5.007603766716063e-06, 'epoch': 0.68} 68%|██████▊ | 832/1230 [16:21:58<7:46:53, 70.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4060 [2024-07-31 18:34:02,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.18 | bwd_microstep: 5400.06 | bwd_inner_microstep: 5360.11 | bwd_allreduce_microstep: 39.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3572 [2024-07-31 18:34:10,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.59 | bwd_microstep: 5221.34 | bwd_inner_microstep: 5126.28 | bwd_allreduce_microstep: 94.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 18:34:19,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.59 | bwd_microstep: 4987.27 | bwd_inner_microstep: 4967.40 | bwd_allreduce_microstep: 19.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-31 18:34:27,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3049.98 | bwd_microstep: 5001.12 | bwd_inner_microstep: 4615.73 | bwd_allreduce_microstep: 385.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 18:34:36,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.86 | bwd_microstep: 5118.10 | bwd_inner_microstep: 5041.89 | bwd_allreduce_microstep: 76.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-31 18:34:45,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.93 | bwd_microstep: 4888.21 | bwd_inner_microstep: 4868.88 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-31 18:34:53,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.31 | bwd_microstep: 5070.69 | bwd_inner_microstep: 5025.06 | bwd_allreduce_microstep: 45.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 18:35:02,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 18:35:02,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.34 | bwd_microstep: 5053.49 | bwd_inner_microstep: 4993.69 | bwd_allreduce_microstep: 59.73 | step_microstep: 182.00 [2024-07-31 18:35:02,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28686.69 | bwd: 40740.26 | bwd_inner: 39998.98 | bwd_allreduce: 740.79 | step: 182.70 68%|██████▊ | 833/1230 [16:23:08<7:44:29, 70.20s/it] {'loss': 1.1194, 'learning_rate': 4.984804030021533e-06, 'epoch': 0.68} 68%|██████▊ | 833/1230 [16:23:08<7:44:29, 70.20s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3963 [2024-07-31 18:35:11,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3835.52 | bwd_microstep: 5237.63 | bwd_inner_microstep: 5218.54 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4061 [2024-07-31 18:35:20,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3839.85 | bwd_microstep: 5326.09 | bwd_inner_microstep: 5306.77 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3860 [2024-07-31 18:35:29,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.36 | bwd_microstep: 5017.87 | bwd_inner_microstep: 4998.40 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3766 [2024-07-31 18:35:38,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.72 | bwd_microstep: 5229.12 | bwd_inner_microstep: 5166.41 | bwd_allreduce_microstep: 62.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3738 [2024-07-31 18:35:47,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.51 | bwd_microstep: 5150.42 | bwd_inner_microstep: 5073.34 | bwd_allreduce_microstep: 77.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-31 18:35:55,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.97 | bwd_microstep: 5128.59 | bwd_inner_microstep: 4731.32 | bwd_allreduce_microstep: 397.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 18:36:04,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.52 | bwd_microstep: 5008.17 | bwd_inner_microstep: 4953.85 | bwd_allreduce_microstep: 54.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 18:36:13,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 18:36:13,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.93 | bwd_microstep: 5113.92 | bwd_inner_microstep: 4716.15 | bwd_allreduce_microstep: 397.70 | step_microstep: 183.02 [2024-07-31 18:36:13,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29124.27 | bwd: 41211.79 | bwd_inner: 40164.73 | bwd_allreduce: 1046.56 | step: 183.60 68%|██████▊ | 834/1230 [16:24:19<7:44:14, 70.34s/it] {'loss': 1.1803, 'learning_rate': 4.962039071480098e-06, 'epoch': 0.68} 68%|██████▊ | 834/1230 [16:24:19<7:44:14, 70.34s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4062 [2024-07-31 18:36:22,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3840.67 | bwd_microstep: 5379.23 | bwd_inner_microstep: 5360.17 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3862 [2024-07-31 18:36:31,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.22 | bwd_microstep: 5100.24 | bwd_inner_microstep: 5080.88 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-07-31 18:36:40,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.28 | bwd_microstep: 5043.82 | bwd_inner_microstep: 5018.04 | bwd_allreduce_microstep: 25.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3753 [2024-07-31 18:36:48,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.05 | bwd_microstep: 5050.85 | bwd_inner_microstep: 4987.11 | bwd_allreduce_microstep: 63.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2218 [2024-07-31 18:36:57,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.37 | bwd_microstep: 5208.68 | bwd_inner_microstep: 4805.46 | bwd_allreduce_microstep: 403.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 18:37:06,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.70 | bwd_microstep: 5065.20 | bwd_inner_microstep: 5001.60 | bwd_allreduce_microstep: 63.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-31 18:37:15,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.41 | bwd_microstep: 5160.31 | bwd_inner_microstep: 5086.35 | bwd_allreduce_microstep: 73.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 18:37:23,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.80 [2024-07-31 18:37:23,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.08 | bwd_microstep: 4981.39 | bwd_inner_microstep: 4934.52 | bwd_allreduce_microstep: 46.80 | step_microstep: 181.58 [2024-07-31 18:37:23,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29167.69 | bwd: 40989.67 | bwd_inner: 40274.07 | bwd_allreduce: 715.12 | step: 182.16 68%|██████▊ | 835/1230 [16:25:29<7:43:21, 70.38s/it] {'loss': 1.177, 'learning_rate': 4.939309048956622e-06, 'epoch': 0.68} 68%|██████▊ | 835/1230 [16:25:29<7:43:21, 70.38s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3904 [2024-07-31 18:37:32,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.79 | bwd_microstep: 5204.08 | bwd_inner_microstep: 5159.16 | bwd_allreduce_microstep: 44.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3868 [2024-07-31 18:37:41,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.60 | bwd_microstep: 5392.87 | bwd_inner_microstep: 5320.06 | bwd_allreduce_microstep: 72.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-31 18:37:50,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.06 | bwd_microstep: 5178.82 | bwd_inner_microstep: 5096.37 | bwd_allreduce_microstep: 82.38 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2880 [2024-07-31 18:37:59,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.85 | bwd_microstep: 5196.42 | bwd_inner_microstep: 4791.92 | bwd_allreduce_microstep: 404.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 18:38:07,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.26 | bwd_microstep: 4885.30 | bwd_inner_microstep: 4865.93 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2121 [2024-07-31 18:38:16,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.31 | bwd_microstep: 5110.85 | bwd_inner_microstep: 4712.33 | bwd_allreduce_microstep: 398.46 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2110 [2024-07-31 18:38:25,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.36 | bwd_microstep: 5086.73 | bwd_inner_microstep: 4691.76 | bwd_allreduce_microstep: 394.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 18:38:34,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 18:38:34,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.61 | bwd_microstep: 4988.71 | bwd_inner_microstep: 4969.37 | bwd_allreduce_microstep: 19.28 | step_microstep: 181.15 [2024-07-31 18:38:34,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28887.74 | bwd: 41043.77 | bwd_inner: 39606.84 | bwd_allreduce: 1436.44 | step: 181.73 68%|██████▊ | 836/1230 [16:26:40<7:41:57, 70.35s/it] {'loss': 1.125, 'learning_rate': 4.916614120073693e-06, 'epoch': 0.68} 68%|██████▊ | 836/1230 [16:26:40<7:41:57, 70.35s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2410 [2024-07-31 18:38:43,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.04 | bwd_microstep: 5376.24 | bwd_inner_microstep: 4961.29 | bwd_allreduce_microstep: 414.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-07-31 18:38:51,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.43 | bwd_microstep: 4943.22 | bwd_inner_microstep: 4559.03 | bwd_allreduce_microstep: 384.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-07-31 18:39:00,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.83 | bwd_microstep: 5236.23 | bwd_inner_microstep: 4829.57 | bwd_allreduce_microstep: 406.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-07-31 18:39:08,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.15 | bwd_microstep: 4981.44 | bwd_inner_microstep: 4962.07 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 18:39:17,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.61 | bwd_microstep: 5023.58 | bwd_inner_microstep: 4963.86 | bwd_allreduce_microstep: 59.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 18:39:26,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.58 | bwd_microstep: 4964.04 | bwd_inner_microstep: 4930.45 | bwd_allreduce_microstep: 33.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 18:39:34,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.32 | bwd_microstep: 5111.37 | bwd_inner_microstep: 5039.25 | bwd_allreduce_microstep: 72.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 18:39:43,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 18:39:43,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.45 | bwd_microstep: 4950.22 | bwd_inner_microstep: 4904.39 | bwd_allreduce_microstep: 45.77 | step_microstep: 182.41 [2024-07-31 18:39:43,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28514.31 | bwd: 40586.34 | bwd_inner: 39149.85 | bwd_allreduce: 1436.00 | step: 183.00 68%|██████▊ | 837/1230 [16:27:49<7:38:58, 70.07s/it] {'loss': 1.1602, 'learning_rate': 4.89395444221055e-06, 'epoch': 0.68} 68%|██████▊ | 837/1230 [16:27:49<7:38:58, 70.07s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2287 [2024-07-31 18:39:52,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.94 | bwd_microstep: 5445.17 | bwd_inner_microstep: 5027.33 | bwd_allreduce_microstep: 417.76 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3599 [2024-07-31 18:40:01,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3346.55 | bwd_microstep: 5085.17 | bwd_inner_microstep: 5004.79 | bwd_allreduce_microstep: 80.31 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2282 [2024-07-31 18:40:10,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.50 | bwd_microstep: 5422.48 | bwd_inner_microstep: 5003.80 | bwd_allreduce_microstep: 418.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-31 18:40:18,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.64 | bwd_microstep: 4816.78 | bwd_inner_microstep: 4793.54 | bwd_allreduce_microstep: 23.17 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-31 18:40:26,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3310.57 | bwd_microstep: 5042.04 | bwd_inner_microstep: 4650.10 | bwd_allreduce_microstep: 391.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-31 18:40:35,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.60 | bwd_microstep: 5186.60 | bwd_inner_microstep: 5107.47 | bwd_allreduce_microstep: 79.07 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3673 [2024-07-31 18:40:44,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.25 | bwd_microstep: 5195.43 | bwd_inner_microstep: 5176.03 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-31 18:40:53,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 18:40:53,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.97 | bwd_microstep: 5109.68 | bwd_inner_microstep: 4711.69 | bwd_allreduce_microstep: 397.92 | step_microstep: 181.39 [2024-07-31 18:40:53,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27916.93 | bwd: 41303.32 | bwd_inner: 39474.69 | bwd_allreduce: 1828.12 | step: 182.09 68%|██████▊ | 838/1230 [16:28:59<7:36:48, 69.92s/it] {'loss': 1.1623, 'learning_rate': 4.871330172501979e-06, 'epoch': 0.68} 68%|██████▊ | 838/1230 [16:28:59<7:36:48, 69.92s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 18:41:01,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.59 | bwd_microstep: 5180.57 | bwd_inner_microstep: 5102.17 | bwd_allreduce_microstep: 78.33 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3206 [2024-07-31 18:41:10,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.74 | bwd_microstep: 5306.21 | bwd_inner_microstep: 5107.11 | bwd_allreduce_microstep: 199.03 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3737 [2024-07-31 18:41:19,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.67 | bwd_microstep: 5064.27 | bwd_inner_microstep: 5033.55 | bwd_allreduce_microstep: 30.65 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 18:41:27,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.26 | bwd_microstep: 4851.85 | bwd_inner_microstep: 4802.32 | bwd_allreduce_microstep: 49.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 18:41:36,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.24 | bwd_microstep: 5170.75 | bwd_inner_microstep: 5088.69 | bwd_allreduce_microstep: 81.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3641 [2024-07-31 18:41:45,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.64 | bwd_microstep: 5178.10 | bwd_inner_microstep: 5078.68 | bwd_allreduce_microstep: 99.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 18:41:53,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.42 | bwd_microstep: 5025.19 | bwd_inner_microstep: 4969.60 | bwd_allreduce_microstep: 55.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 18:42:02,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 18:42:02,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.59 | bwd_microstep: 5170.40 | bwd_inner_microstep: 5084.62 | bwd_allreduce_microstep: 85.71 | step_microstep: 181.67 [2024-07-31 18:42:02,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28457.05 | bwd: 40947.32 | bwd_inner: 40266.70 | bwd_allreduce: 680.15 | step: 182.26 68%|██████▊ | 839/1230 [16:30:08<7:35:16, 69.86s/it] {'loss': 1.1648, 'learning_rate': 4.8487414678372315e-06, 'epoch': 0.68} 68%|██████▊ | 839/1230 [16:30:08<7:35:16, 69.86s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3955 [2024-07-31 18:42:11,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3796.15 | bwd_microstep: 5171.34 | bwd_inner_microstep: 5152.32 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3787 [2024-07-31 18:42:20,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.79 | bwd_microstep: 5254.88 | bwd_inner_microstep: 5206.10 | bwd_allreduce_microstep: 48.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2262 [2024-07-31 18:42:29,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.78 | bwd_microstep: 5344.65 | bwd_inner_microstep: 4929.99 | bwd_allreduce_microstep: 414.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-31 18:42:38,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.15 | bwd_microstep: 4988.41 | bwd_inner_microstep: 4969.02 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2198 [2024-07-31 18:42:47,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.74 | bwd_microstep: 5211.31 | bwd_inner_microstep: 4805.08 | bwd_allreduce_microstep: 406.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 18:42:55,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.20 | bwd_microstep: 5003.88 | bwd_inner_microstep: 4965.44 | bwd_allreduce_microstep: 38.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 18:43:04,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.73 | bwd_microstep: 5033.51 | bwd_inner_microstep: 4977.52 | bwd_allreduce_microstep: 55.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 18:43:13,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 18:43:13,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.84 | bwd_microstep: 4997.67 | bwd_inner_microstep: 4944.51 | bwd_allreduce_microstep: 53.08 | step_microstep: 181.37 [2024-07-31 18:43:13,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29175.28 | bwd: 41005.62 | bwd_inner: 39949.94 | bwd_allreduce: 1055.20 | step: 181.94 68%|██████▊ | 840/1230 [16:31:19<7:35:22, 70.06s/it] {'loss': 1.1159, 'learning_rate': 4.826188484858913e-06, 'epoch': 0.68} 68%|██████▊ | 840/1230 [16:31:19<7:35:22, 70.06s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3944 [2024-07-31 18:43:22,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.74 | bwd_microstep: 5291.52 | bwd_inner_microstep: 5239.28 | bwd_allreduce_microstep: 52.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3840 [2024-07-31 18:43:31,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.90 | bwd_microstep: 5042.64 | bwd_inner_microstep: 5023.23 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2236 [2024-07-31 18:43:39,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.98 | bwd_microstep: 5149.19 | bwd_inner_microstep: 4749.10 | bwd_allreduce_microstep: 400.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 18:43:48,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.26 | bwd_microstep: 5254.81 | bwd_inner_microstep: 4848.87 | bwd_allreduce_microstep: 405.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 18:43:57,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.40 | bwd_microstep: 5005.62 | bwd_inner_microstep: 4966.90 | bwd_allreduce_microstep: 38.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 18:44:05,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.95 | bwd_microstep: 4977.21 | bwd_inner_microstep: 4925.32 | bwd_allreduce_microstep: 51.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3702 [2024-07-31 18:44:14,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.25 | bwd_microstep: 4973.76 | bwd_inner_microstep: 4916.38 | bwd_allreduce_microstep: 57.31 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 18:44:23,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 18:44:23,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.32 | bwd_microstep: 5169.88 | bwd_inner_microstep: 5096.67 | bwd_allreduce_microstep: 73.13 | step_microstep: 181.08 [2024-07-31 18:44:23,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28873.70 | bwd: 40864.60 | bwd_inner: 39765.71 | bwd_allreduce: 1098.41 | step: 181.66 68%|██████▊ | 841/1230 [16:32:29<7:34:13, 70.06s/it] {'loss': 1.1565, 'learning_rate': 4.803671379961945e-06, 'epoch': 0.68} 68%|██████▊ | 841/1230 [16:32:29<7:34:13, 70.06s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3899 [2024-07-31 18:44:32,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.07 | bwd_microstep: 5060.53 | bwd_inner_microstep: 5032.44 | bwd_allreduce_microstep: 28.03 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3754 [2024-07-31 18:44:40,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.14 | bwd_microstep: 5056.78 | bwd_inner_microstep: 5020.87 | bwd_allreduce_microstep: 35.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 18:44:49,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.70 | bwd_microstep: 5228.22 | bwd_inner_microstep: 5148.86 | bwd_allreduce_microstep: 79.30 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3612 [2024-07-31 18:44:58,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.29 | bwd_microstep: 5087.08 | bwd_inner_microstep: 5034.46 | bwd_allreduce_microstep: 52.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 18:45:07,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.40 | bwd_microstep: 4881.64 | bwd_inner_microstep: 4862.31 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 18:45:14,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3185.07 | bwd_microstep: 4737.56 | bwd_inner_microstep: 4707.07 | bwd_allreduce_microstep: 30.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 18:45:23,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.53 | bwd_microstep: 4889.80 | bwd_inner_microstep: 4870.49 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3644 [2024-07-31 18:45:32,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 18:45:32,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.81 | bwd_microstep: 5089.62 | bwd_inner_microstep: 5003.35 | bwd_allreduce_microstep: 86.20 | step_microstep: 181.50 [2024-07-31 18:45:32,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28644.92 | bwd: 40031.21 | bwd_inner: 39679.78 | bwd_allreduce: 350.94 | step: 182.10 68%|██████▊ | 842/1230 [16:33:38<7:31:01, 69.75s/it] {'loss': 1.1528, 'learning_rate': 4.781190309292421e-06, 'epoch': 0.68} 68%|██████▊ | 842/1230 [16:33:38<7:31:01, 69.75s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3554 [2024-07-31 18:45:41,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.04 | bwd_microstep: 5444.88 | bwd_inner_microstep: 5275.35 | bwd_allreduce_microstep: 169.45 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3791 [2024-07-31 18:45:50,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.78 | bwd_microstep: 5321.48 | bwd_inner_microstep: 5252.85 | bwd_allreduce_microstep: 68.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3796 [2024-07-31 18:45:59,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.86 | bwd_microstep: 5194.97 | bwd_inner_microstep: 5144.45 | bwd_allreduce_microstep: 50.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2234 [2024-07-31 18:46:08,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.37 | bwd_microstep: 5182.92 | bwd_inner_microstep: 4777.91 | bwd_allreduce_microstep: 404.94 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 18:46:17,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.91 | bwd_microstep: 5215.13 | bwd_inner_microstep: 5132.49 | bwd_allreduce_microstep: 82.57 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2198 [2024-07-31 18:46:25,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.08 | bwd_microstep: 5229.70 | bwd_inner_microstep: 4823.01 | bwd_allreduce_microstep: 406.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-31 18:46:34,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.59 | bwd_microstep: 5278.87 | bwd_inner_microstep: 5259.56 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 18:46:43,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 18:46:43,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.87 | bwd_microstep: 4951.06 | bwd_inner_microstep: 4918.59 | bwd_allreduce_microstep: 32.40 | step_microstep: 183.22 [2024-07-31 18:46:43,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28919.41 | bwd: 41818.99 | bwd_inner: 40584.16 | bwd_allreduce: 1234.31 | step: 183.95 69%|██████▊ | 843/1230 [16:34:49<7:32:25, 70.14s/it] {'loss': 1.1858, 'learning_rate': 4.758745428746573e-06, 'epoch': 0.69} 69%|██████▊ | 843/1230 [16:34:49<7:32:25, 70.14s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3927 [2024-07-31 18:46:52,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.24 | bwd_microstep: 5287.08 | bwd_inner_microstep: 5232.80 | bwd_allreduce_microstep: 54.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3937 [2024-07-31 18:47:01,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.77 | bwd_microstep: 5173.38 | bwd_inner_microstep: 5154.08 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 18:47:10,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.55 | bwd_microstep: 5199.32 | bwd_inner_microstep: 5113.69 | bwd_allreduce_microstep: 85.57 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2207 [2024-07-31 18:47:18,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.91 | bwd_microstep: 5162.41 | bwd_inner_microstep: 4762.29 | bwd_allreduce_microstep: 400.05 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-31 18:47:27,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.47 | bwd_microstep: 5003.33 | bwd_inner_microstep: 4968.23 | bwd_allreduce_microstep: 35.04 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2153 [2024-07-31 18:47:36,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.69 | bwd_microstep: 5093.23 | bwd_inner_microstep: 4697.42 | bwd_allreduce_microstep: 395.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 18:47:44,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.10 | bwd_microstep: 5059.45 | bwd_inner_microstep: 4664.86 | bwd_allreduce_microstep: 394.53 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3695 [2024-07-31 18:47:53,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-31 18:47:53,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.60 | bwd_microstep: 5163.65 | bwd_inner_microstep: 5071.61 | bwd_allreduce_microstep: 91.97 | step_microstep: 182.57 [2024-07-31 18:47:53,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28734.24 | bwd: 41141.84 | bwd_inner: 39664.91 | bwd_allreduce: 1476.44 | step: 183.16 69%|██████▊ | 844/1230 [16:35:59<7:31:22, 70.16s/it] {'loss': 1.133, 'learning_rate': 4.736336893969652e-06, 'epoch': 0.69} 69%|██████▊ | 844/1230 [16:35:59<7:31:22, 70.16s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3937 [2024-07-31 18:48:02,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3827.68 | bwd_microstep: 5178.34 | bwd_inner_microstep: 5155.97 | bwd_allreduce_microstep: 22.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3568 [2024-07-31 18:48:11,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.80 | bwd_microstep: 5186.17 | bwd_inner_microstep: 5100.42 | bwd_allreduce_microstep: 85.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-31 18:48:20,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.92 | bwd_microstep: 5199.94 | bwd_inner_microstep: 5108.55 | bwd_allreduce_microstep: 91.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-31 18:48:29,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.58 | bwd_microstep: 5200.47 | bwd_inner_microstep: 4795.29 | bwd_allreduce_microstep: 405.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 18:48:37,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3437.31 | bwd_microstep: 4839.67 | bwd_inner_microstep: 4817.89 | bwd_allreduce_microstep: 21.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 18:48:45,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3173.42 | bwd_microstep: 4699.78 | bwd_inner_microstep: 4673.21 | bwd_allreduce_microstep: 26.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 18:48:54,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.94 | bwd_microstep: 5148.19 | bwd_inner_microstep: 5079.50 | bwd_allreduce_microstep: 68.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 18:49:02,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 18:49:02,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.92 | bwd_microstep: 4877.55 | bwd_inner_microstep: 4858.30 | bwd_allreduce_microstep: 19.18 | step_microstep: 181.97 [2024-07-31 18:49:02,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28507.47 | bwd: 40330.10 | bwd_inner: 39589.07 | bwd_allreduce: 740.54 | step: 182.55 69%|██████▊ | 845/1230 [16:37:08<7:28:17, 69.86s/it] {'loss': 1.1275, 'learning_rate': 4.7139648603548925e-06, 'epoch': 0.69} 69%|██████▊ | 845/1230 [16:37:08<7:28:17, 69.86s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3973 [2024-07-31 18:49:12,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.51 | bwd_microstep: 5455.14 | bwd_inner_microstep: 5398.44 | bwd_allreduce_microstep: 56.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3776 [2024-07-31 18:49:21,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.78 | bwd_microstep: 5430.60 | bwd_inner_microstep: 5342.98 | bwd_allreduce_microstep: 87.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 18:49:30,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.01 | bwd_microstep: 5213.44 | bwd_inner_microstep: 5152.35 | bwd_allreduce_microstep: 61.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-31 18:49:37,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3184.00 | bwd_microstep: 4635.83 | bwd_inner_microstep: 4616.47 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 18:49:45,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3225.44 | bwd_microstep: 4791.53 | bwd_inner_microstep: 4772.16 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-31 18:49:54,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.64 | bwd_microstep: 5132.64 | bwd_inner_microstep: 5081.57 | bwd_allreduce_microstep: 51.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 18:50:03,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.14 | bwd_microstep: 5161.57 | bwd_inner_microstep: 5095.67 | bwd_allreduce_microstep: 65.83 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 18:50:12,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 18:50:12,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.06 | bwd_microstep: 5142.67 | bwd_inner_microstep: 5069.60 | bwd_allreduce_microstep: 73.00 | step_microstep: 182.38 [2024-07-31 18:50:12,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28219.48 | bwd: 40963.40 | bwd_inner: 40529.17 | bwd_allreduce: 433.74 | step: 182.96 69%|██████▉ | 846/1230 [16:38:18<7:26:27, 69.76s/it] {'loss': 1.1115, 'learning_rate': 4.691629483042387e-06, 'epoch': 0.69} 69%|██████▉ | 846/1230 [16:38:18<7:26:27, 69.76s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3962 [2024-07-31 18:50:21,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3801.82 | bwd_microstep: 5234.08 | bwd_inner_microstep: 5214.92 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2349 [2024-07-31 18:50:29,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3114.10 | bwd_microstep: 5224.04 | bwd_inner_microstep: 4824.71 | bwd_allreduce_microstep: 399.26 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3735 [2024-07-31 18:50:38,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.72 | bwd_microstep: 5231.10 | bwd_inner_microstep: 5178.98 | bwd_allreduce_microstep: 52.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 18:50:47,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.61 | bwd_microstep: 5117.75 | bwd_inner_microstep: 5051.70 | bwd_allreduce_microstep: 65.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-31 18:50:56,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.22 | bwd_microstep: 5114.83 | bwd_inner_microstep: 5040.27 | bwd_allreduce_microstep: 74.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 18:51:04,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.91 | bwd_microstep: 5026.39 | bwd_inner_microstep: 4985.16 | bwd_allreduce_microstep: 41.14 | step_microstep: 0.13 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2917 [2024-07-31 18:51:13,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.97 | bwd_microstep: 5038.53 | bwd_inner_microstep: 4688.74 | bwd_allreduce_microstep: 349.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 18:51:22,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 18:51:22,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.35 | bwd_microstep: 4904.70 | bwd_inner_microstep: 4885.30 | bwd_allreduce_microstep: 19.32 | step_microstep: 181.19 [2024-07-31 18:51:22,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28515.63 | bwd: 40891.40 | bwd_inner: 39869.72 | bwd_allreduce: 1021.18 | step: 181.81 69%|██████▉ | 847/1230 [16:39:28<7:25:17, 69.76s/it] {'loss': 1.1297, 'learning_rate': 4.669330916918047e-06, 'epoch': 0.69} 69%|██████▉ | 847/1230 [16:39:28<7:25:17, 69.76s/it]dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1969 [2024-07-31 18:51:31,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.36 | bwd_microstep: 5394.34 | bwd_inner_microstep: 4978.18 | bwd_allreduce_microstep: 416.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2233 [2024-07-31 18:51:39,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.44 | bwd_microstep: 5163.09 | bwd_inner_microstep: 4763.61 | bwd_allreduce_microstep: 399.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3786 [2024-07-31 18:51:48,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.33 | bwd_microstep: 5025.46 | bwd_inner_microstep: 5006.07 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3808 [2024-07-31 18:51:57,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.87 | bwd_microstep: 5113.51 | bwd_inner_microstep: 5070.83 | bwd_allreduce_microstep: 42.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-31 18:52:06,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.94 | bwd_microstep: 5217.61 | bwd_inner_microstep: 4811.26 | bwd_allreduce_microstep: 406.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-31 18:52:14,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.76 | bwd_microstep: 5002.40 | bwd_inner_microstep: 4981.92 | bwd_allreduce_microstep: 20.41 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-31 18:52:23,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.23 | bwd_microstep: 5112.42 | bwd_inner_microstep: 4715.88 | bwd_allreduce_microstep: 396.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3665 [2024-07-31 18:52:32,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.84 [2024-07-31 18:52:32,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.36 | bwd_microstep: 4867.95 | bwd_inner_microstep: 4848.51 | bwd_allreduce_microstep: 19.36 | step_microstep: 182.93 [2024-07-31 18:52:32,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28826.19 | bwd: 40896.77 | bwd_inner: 39176.21 | bwd_allreduce: 1720.07 | step: 183.52 69%|██████▉ | 848/1230 [16:40:38<7:24:41, 69.85s/it] {'loss': 1.1774, 'learning_rate': 4.647069316612499e-06, 'epoch': 0.69} 69%|██████▉ | 848/1230 [16:40:38<7:24:41, 69.85s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2425 [2024-07-31 18:52:41,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.81 | bwd_microstep: 5675.10 | bwd_inner_microstep: 5239.34 | bwd_allreduce_microstep: 435.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3562 [2024-07-31 18:52:50,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.04 | bwd_microstep: 5134.34 | bwd_inner_microstep: 5056.19 | bwd_allreduce_microstep: 78.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3613 [2024-07-31 18:52:59,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.87 | bwd_microstep: 5172.98 | bwd_inner_microstep: 5078.71 | bwd_allreduce_microstep: 94.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3736 [2024-07-31 18:53:07,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3121.87 | bwd_microstep: 4976.97 | bwd_inner_microstep: 4926.75 | bwd_allreduce_microstep: 50.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 18:53:15,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.67 | bwd_microstep: 4984.06 | bwd_inner_microstep: 4931.12 | bwd_allreduce_microstep: 52.87 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 18:53:24,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.21 | bwd_microstep: 5191.14 | bwd_inner_microstep: 4788.97 | bwd_allreduce_microstep: 402.10 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2122 [2024-07-31 18:53:33,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.32 | bwd_microstep: 5093.01 | bwd_inner_microstep: 4696.11 | bwd_allreduce_microstep: 396.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 18:53:41,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 18:53:41,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.52 | bwd_microstep: 5053.77 | bwd_inner_microstep: 4994.79 | bwd_allreduce_microstep: 58.91 | step_microstep: 181.76 [2024-07-31 18:53:41,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28127.21 | bwd: 41281.34 | bwd_inner: 39711.93 | bwd_allreduce: 1568.93 | step: 182.36 69%|██████▉ | 849/1230 [16:41:47<7:23:18, 69.81s/it] {'loss': 1.1648, 'learning_rate': 4.624844836500052e-06, 'epoch': 0.69} 69%|██████▉ | 849/1230 [16:41:47<7:23:18, 69.81s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 18:53:51,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3849.24 | bwd_microstep: 5340.95 | bwd_inner_microstep: 5321.86 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3825 [2024-07-31 18:53:59,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.53 | bwd_microstep: 5177.88 | bwd_inner_microstep: 5128.44 | bwd_allreduce_microstep: 49.38 | step_microstep: 0.23 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 18:54:08,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.33 | bwd_microstep: 5179.51 | bwd_inner_microstep: 5122.44 | bwd_allreduce_microstep: 57.00 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-31 18:54:17,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.98 | bwd_microstep: 5111.64 | bwd_inner_microstep: 5080.49 | bwd_allreduce_microstep: 31.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-31 18:54:26,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3476.73 | bwd_microstep: 5068.04 | bwd_inner_microstep: 4675.97 | bwd_allreduce_microstep: 391.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 18:54:34,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.91 | bwd_microstep: 5124.97 | bwd_inner_microstep: 5048.95 | bwd_allreduce_microstep: 75.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 18:54:43,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.67 | bwd_microstep: 5140.08 | bwd_inner_microstep: 4743.81 | bwd_allreduce_microstep: 396.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3797 [2024-07-31 18:54:52,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 18:54:52,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.16 | bwd_microstep: 4992.61 | bwd_inner_microstep: 4962.14 | bwd_allreduce_microstep: 30.40 | step_microstep: 182.12 [2024-07-31 18:54:52,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29061.48 | bwd: 41135.64 | bwd_inner: 40084.04 | bwd_allreduce: 1051.12 | step: 182.85 69%|██████▉ | 850/1230 [16:42:58<7:23:30, 70.03s/it] {'loss': 1.1578, 'learning_rate': 4.60265763069758e-06, 'epoch': 0.69} 69%|██████▉ | 850/1230 [16:42:58<7:23:30, 70.03s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2330 [2024-07-31 18:55:01,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.65 | bwd_microstep: 5301.43 | bwd_inner_microstep: 4893.48 | bwd_allreduce_microstep: 407.88 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2815 [2024-07-31 18:55:10,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.73 | bwd_microstep: 5259.25 | bwd_inner_microstep: 4851.49 | bwd_allreduce_microstep: 407.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-31 18:55:19,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.55 | bwd_microstep: 5228.63 | bwd_inner_microstep: 4822.87 | bwd_allreduce_microstep: 405.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-31 18:55:27,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.84 | bwd_microstep: 5020.33 | bwd_inner_microstep: 5001.00 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3626 [2024-07-31 18:55:36,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.87 | bwd_microstep: 5158.80 | bwd_inner_microstep: 5065.81 | bwd_allreduce_microstep: 92.91 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 18:55:45,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.26 | bwd_microstep: 5051.73 | bwd_inner_microstep: 5008.90 | bwd_allreduce_microstep: 42.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 18:55:53,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.55 | bwd_microstep: 4887.74 | bwd_inner_microstep: 4868.34 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3670 [2024-07-31 18:56:02,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 18:56:02,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3060.96 | bwd_microstep: 4843.27 | bwd_inner_microstep: 4800.17 | bwd_allreduce_microstep: 43.03 | step_microstep: 182.01 [2024-07-31 18:56:02,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28509.32 | bwd: 40751.17 | bwd_inner: 39312.02 | bwd_allreduce: 1438.67 | step: 182.60 69%|██████▉ | 851/1230 [16:44:07<7:21:30, 69.90s/it] {'loss': 1.165, 'learning_rate': 4.580507853063487e-06, 'epoch': 0.69} 69%|██████▉ | 851/1230 [16:44:07<7:21:30, 69.90s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 18:56:11,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3851.51 | bwd_microstep: 5373.04 | bwd_inner_microstep: 5353.91 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3572 [2024-07-31 18:56:20,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.41 | bwd_microstep: 5131.79 | bwd_inner_microstep: 5053.41 | bwd_allreduce_microstep: 78.32 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2843 [2024-07-31 18:56:27,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3014.93 | bwd_microstep: 4837.43 | bwd_inner_microstep: 4572.68 | bwd_allreduce_microstep: 264.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 18:56:36,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.81 | bwd_microstep: 5034.02 | bwd_inner_microstep: 4991.11 | bwd_allreduce_microstep: 42.84 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-31 18:56:45,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.88 | bwd_microstep: 5015.49 | bwd_inner_microstep: 4956.81 | bwd_allreduce_microstep: 58.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 18:56:54,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.45 | bwd_microstep: 5155.69 | bwd_inner_microstep: 5078.48 | bwd_allreduce_microstep: 77.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 18:57:02,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.43 | bwd_microstep: 5053.66 | bwd_inner_microstep: 4995.48 | bwd_allreduce_microstep: 58.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 18:57:11,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 18:57:11,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.71 | bwd_microstep: 5186.40 | bwd_inner_microstep: 4781.62 | bwd_allreduce_microstep: 404.71 | step_microstep: 182.92 [2024-07-31 18:57:11,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28469.04 | bwd: 40787.50 | bwd_inner: 39783.44 | bwd_allreduce: 1003.58 | step: 183.51 69%|██████▉ | 852/1230 [16:45:17<7:19:46, 69.81s/it] {'loss': 1.1498, 'learning_rate': 4.5583956571966295e-06, 'epoch': 0.69} 69%|██████▉ | 852/1230 [16:45:17<7:19:46, 69.81s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3533 [2024-07-31 18:57:20,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3378.21 | bwd_microstep: 5164.13 | bwd_inner_microstep: 5079.47 | bwd_allreduce_microstep: 84.58 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3566 [2024-07-31 18:57:28,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.81 | bwd_microstep: 5157.48 | bwd_inner_microstep: 5073.86 | bwd_allreduce_microstep: 83.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3825 [2024-07-31 18:57:37,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.99 | bwd_microstep: 5153.51 | bwd_inner_microstep: 5107.05 | bwd_allreduce_microstep: 46.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 18:57:46,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3394.99 | bwd_microstep: 4894.78 | bwd_inner_microstep: 4856.17 | bwd_allreduce_microstep: 38.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 18:57:54,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.07 | bwd_microstep: 5143.21 | bwd_inner_microstep: 5076.23 | bwd_allreduce_microstep: 66.92 | step_microstep: 0.20 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 18:58:03,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.90 | bwd_microstep: 5107.67 | bwd_inner_microstep: 5044.56 | bwd_allreduce_microstep: 63.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-31 18:58:12,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.30 | bwd_microstep: 4977.86 | bwd_inner_microstep: 4945.68 | bwd_allreduce_microstep: 32.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 18:58:21,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 18:58:21,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.58 | bwd_microstep: 5035.62 | bwd_inner_microstep: 4981.72 | bwd_allreduce_microstep: 53.83 | step_microstep: 181.48 [2024-07-31 18:58:21,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28437.77 | bwd: 40634.24 | bwd_inner: 40164.67 | bwd_allreduce: 469.09 | step: 182.18 69%|██████▉ | 853/1230 [16:46:26<7:17:51, 69.69s/it] {'loss': 1.1535, 'learning_rate': 4.5363211964352524e-06, 'epoch': 0.69} 69%|██████▉ | 853/1230 [16:46:26<7:17:51, 69.69s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 18:58:30,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3870.65 | bwd_microstep: 5337.65 | bwd_inner_microstep: 5318.58 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2137 [2024-07-31 18:58:39,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.30 | bwd_microstep: 5347.65 | bwd_inner_microstep: 4934.64 | bwd_allreduce_microstep: 412.95 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-31 18:58:47,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.44 | bwd_microstep: 5060.21 | bwd_inner_microstep: 5021.84 | bwd_allreduce_microstep: 38.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-31 18:58:56,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.77 | bwd_microstep: 5032.57 | bwd_inner_microstep: 5005.66 | bwd_allreduce_microstep: 26.85 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3755 [2024-07-31 18:59:05,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.79 | bwd_microstep: 5007.70 | bwd_inner_microstep: 4980.54 | bwd_allreduce_microstep: 27.09 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3610 [2024-07-31 18:59:14,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.94 | bwd_microstep: 5096.46 | bwd_inner_microstep: 5019.68 | bwd_allreduce_microstep: 76.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 18:59:22,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.25 | bwd_microstep: 5054.62 | bwd_inner_microstep: 5010.95 | bwd_allreduce_microstep: 43.61 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3691 [2024-07-31 18:59:31,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 18:59:31,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.61 | bwd_microstep: 5335.54 | bwd_inner_microstep: 5172.14 | bwd_allreduce_microstep: 163.33 | step_microstep: 182.06 [2024-07-31 18:59:31,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29250.65 | bwd: 41272.40 | bwd_inner: 40463.96 | bwd_allreduce: 807.96 | step: 182.65 69%|██████▉ | 854/1230 [16:47:37<7:18:53, 70.04s/it] {'loss': 1.1439, 'learning_rate': 4.514284623855915e-06, 'epoch': 0.69} 69%|██████▉ | 854/1230 [16:47:37<7:18:53, 70.04s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3912 [2024-07-31 18:59:40,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3820.01 | bwd_microstep: 5155.84 | bwd_inner_microstep: 5135.46 | bwd_allreduce_microstep: 20.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2296 [2024-07-31 18:59:49,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.27 | bwd_microstep: 5272.24 | bwd_inner_microstep: 4863.30 | bwd_allreduce_microstep: 408.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3831 [2024-07-31 18:59:58,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.56 | bwd_microstep: 5048.70 | bwd_inner_microstep: 5029.38 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3780 [2024-07-31 19:00:06,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3127.46 | bwd_microstep: 4966.87 | bwd_inner_microstep: 4923.29 | bwd_allreduce_microstep: 43.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 19:00:15,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.08 | bwd_microstep: 4979.75 | bwd_inner_microstep: 4960.40 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-31 19:00:24,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.72 | bwd_microstep: 5140.80 | bwd_inner_microstep: 4742.07 | bwd_allreduce_microstep: 398.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 19:00:32,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3692.71 | bwd_microstep: 4882.33 | bwd_inner_microstep: 4862.93 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2143 [2024-07-31 19:00:40,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.43 [2024-07-31 19:00:40,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3018.28 | bwd_microstep: 4896.48 | bwd_inner_microstep: 4518.14 | bwd_allreduce_microstep: 378.27 | step_microstep: 181.37 [2024-07-31 19:00:40,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28230.00 | bwd: 40342.99 | bwd_inner: 39034.92 | bwd_allreduce: 1307.58 | step: 181.95 70%|██████▉ | 855/1230 [16:48:46<7:15:36, 69.70s/it] {'loss': 1.1434, 'learning_rate': 4.4922860922724466e-06, 'epoch': 0.7} 70%|██████▉ | 855/1230 [16:48:46<7:15:36, 69.70s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3925 [2024-07-31 19:00:49,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3798.81 | bwd_microstep: 5152.74 | bwd_inner_microstep: 5133.65 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2279 [2024-07-31 19:00:57,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3025.66 | bwd_microstep: 4880.54 | bwd_inner_microstep: 4503.09 | bwd_allreduce_microstep: 377.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 19:01:06,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.80 | bwd_microstep: 4994.00 | bwd_inner_microstep: 4974.19 | bwd_allreduce_microstep: 19.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-31 19:01:15,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.71 | bwd_microstep: 5136.12 | bwd_inner_microstep: 5090.75 | bwd_allreduce_microstep: 45.30 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2074 [2024-07-31 19:01:23,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3019.33 | bwd_microstep: 4976.38 | bwd_inner_microstep: 4595.67 | bwd_allreduce_microstep: 380.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 19:01:31,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.44 | bwd_microstep: 4907.19 | bwd_inner_microstep: 4887.78 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-31 19:01:40,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.52 | bwd_microstep: 5028.14 | bwd_inner_microstep: 4971.63 | bwd_allreduce_microstep: 56.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 19:01:49,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 19:01:49,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.09 | bwd_microstep: 5052.21 | bwd_inner_microstep: 4985.81 | bwd_allreduce_microstep: 66.33 | step_microstep: 181.08 [2024-07-31 19:01:49,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28012.26 | bwd: 40127.29 | bwd_inner: 39142.54 | bwd_allreduce: 984.27 | step: 181.65 70%|██████▉ | 856/1230 [16:49:55<7:12:08, 69.33s/it] {'loss': 1.2336, 'learning_rate': 4.470325754234881e-06, 'epoch': 0.7} 70%|██████▉ | 856/1230 [16:49:55<7:12:08, 69.33s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3976 [2024-07-31 19:01:58,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3870.64 | bwd_microstep: 5289.56 | bwd_inner_microstep: 5270.47 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3844 [2024-07-31 19:02:07,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.21 | bwd_microstep: 5226.61 | bwd_inner_microstep: 5176.96 | bwd_allreduce_microstep: 49.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-31 19:02:15,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3150.84 | bwd_microstep: 4794.10 | bwd_inner_microstep: 4754.36 | bwd_allreduce_microstep: 39.67 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1069 [2024-07-31 19:02:24,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.94 | bwd_microstep: 5270.72 | bwd_inner_microstep: 4863.51 | bwd_allreduce_microstep: 407.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 19:02:32,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.42 | bwd_microstep: 5126.45 | bwd_inner_microstep: 5053.45 | bwd_allreduce_microstep: 72.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 19:02:40,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3055.39 | bwd_microstep: 5036.07 | bwd_inner_microstep: 4647.09 | bwd_allreduce_microstep: 388.90 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3728 [2024-07-31 19:02:49,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.11 | bwd_microstep: 4995.83 | bwd_inner_microstep: 4944.71 | bwd_allreduce_microstep: 51.05 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3669 [2024-07-31 19:02:58,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 19:02:58,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.17 | bwd_microstep: 5065.04 | bwd_inner_microstep: 4984.55 | bwd_allreduce_microstep: 80.42 | step_microstep: 182.08 [2024-07-31 19:02:58,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27956.63 | bwd: 40804.35 | bwd_inner: 39695.04 | bwd_allreduce: 1108.81 | step: 182.67 70%|██████▉ | 857/1230 [16:51:04<7:10:32, 69.26s/it] {'loss': 1.1662, 'learning_rate': 4.448403762028396e-06, 'epoch': 0.7} 70%|██████▉ | 857/1230 [16:51:04<7:10:32, 69.26s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4059 [2024-07-31 19:03:07,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.11 | bwd_microstep: 5574.42 | bwd_inner_microstep: 5511.63 | bwd_allreduce_microstep: 62.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3815 [2024-07-31 19:03:15,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3261.97 | bwd_microstep: 4893.39 | bwd_inner_microstep: 4869.09 | bwd_allreduce_microstep: 24.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2233 [2024-07-31 19:03:24,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.55 | bwd_microstep: 5302.44 | bwd_inner_microstep: 4889.62 | bwd_allreduce_microstep: 412.76 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-31 19:03:33,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3778.15 | bwd_microstep: 5097.20 | bwd_inner_microstep: 5072.43 | bwd_allreduce_microstep: 24.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 19:03:42,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.74 | bwd_microstep: 5131.08 | bwd_inner_microstep: 5063.00 | bwd_allreduce_microstep: 68.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-07-31 19:03:51,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.12 | bwd_microstep: 5218.60 | bwd_inner_microstep: 4812.82 | bwd_allreduce_microstep: 405.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-31 19:03:59,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.55 | bwd_microstep: 5054.94 | bwd_inner_microstep: 4998.59 | bwd_allreduce_microstep: 56.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2120 [2024-07-31 19:04:08,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 19:04:08,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.03 | bwd_microstep: 5092.83 | bwd_inner_microstep: 4697.74 | bwd_allreduce_microstep: 395.02 | step_microstep: 182.84 [2024-07-31 19:04:08,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28602.12 | bwd: 41364.89 | bwd_inner: 39914.85 | bwd_allreduce: 1449.55 | step: 183.53 70%|██████▉ | 858/1230 [16:52:14<7:11:19, 69.57s/it] {'loss': 1.126, 'learning_rate': 4.4265202676722475e-06, 'epoch': 0.7} 70%|██████▉ | 858/1230 [16:52:14<7:11:19, 69.57s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-31 19:04:17,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3360.71 | bwd_microstep: 5363.03 | bwd_inner_microstep: 5264.27 | bwd_allreduce_microstep: 98.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 19:04:25,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3266.27 | bwd_microstep: 5199.11 | bwd_inner_microstep: 5118.41 | bwd_allreduce_microstep: 80.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3815 [2024-07-31 19:04:34,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.57 | bwd_microstep: 5158.63 | bwd_inner_microstep: 5109.56 | bwd_allreduce_microstep: 49.01 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-31 19:04:43,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.09 | bwd_microstep: 5090.38 | bwd_inner_microstep: 5044.02 | bwd_allreduce_microstep: 46.29 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2176 [2024-07-31 19:04:51,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.23 | bwd_microstep: 5061.39 | bwd_inner_microstep: 4665.82 | bwd_allreduce_microstep: 395.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 19:05:00,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.89 | bwd_microstep: 5047.86 | bwd_inner_microstep: 5007.30 | bwd_allreduce_microstep: 40.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 19:05:09,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.81 | bwd_microstep: 5023.25 | bwd_inner_microstep: 4970.77 | bwd_allreduce_microstep: 52.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 19:05:18,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 19:05:18,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.12 | bwd_microstep: 5076.62 | bwd_inner_microstep: 5015.47 | bwd_allreduce_microstep: 61.08 | step_microstep: 182.43 [2024-07-31 19:05:18,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28012.59 | bwd: 41020.25 | bwd_inner: 40195.57 | bwd_allreduce: 824.20 | step: 183.02 70%|██████▉ | 859/1230 [16:53:23<7:09:46, 69.51s/it] {'loss': 1.1529, 'learning_rate': 4.40467542291874e-06, 'epoch': 0.7} 70%|██████▉ | 859/1230 [16:53:23<7:09:46, 69.51s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3900 [2024-07-31 19:05:27,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3863.34 | bwd_microstep: 5425.66 | bwd_inner_microstep: 5372.01 | bwd_allreduce_microstep: 53.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3801 [2024-07-31 19:05:36,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.17 | bwd_microstep: 5015.24 | bwd_inner_microstep: 4995.82 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-07-31 19:05:45,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.98 | bwd_microstep: 5065.10 | bwd_inner_microstep: 5035.39 | bwd_allreduce_microstep: 29.64 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2094 [2024-07-31 19:05:53,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.94 | bwd_microstep: 5181.77 | bwd_inner_microstep: 4778.46 | bwd_allreduce_microstep: 403.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 19:06:02,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.56 | bwd_microstep: 5177.88 | bwd_inner_microstep: 5095.34 | bwd_allreduce_microstep: 82.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3667 [2024-07-31 19:06:11,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.56 | bwd_microstep: 4867.00 | bwd_inner_microstep: 4847.59 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2125 [2024-07-31 19:06:19,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.85 | bwd_microstep: 5285.53 | bwd_inner_microstep: 4874.03 | bwd_allreduce_microstep: 411.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 19:06:28,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 19:06:28,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.32 | bwd_microstep: 5132.27 | bwd_inner_microstep: 4734.73 | bwd_allreduce_microstep: 397.47 | step_microstep: 181.55 [2024-07-31 19:06:28,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29264.64 | bwd: 41150.44 | bwd_inner: 39733.31 | bwd_allreduce: 1416.64 | step: 182.13 70%|██████▉ | 860/1230 [16:54:34<7:10:55, 69.88s/it] {'loss': 1.1493, 'learning_rate': 4.382869379252152e-06, 'epoch': 0.7} 70%|██████▉ | 860/1230 [16:54:34<7:10:55, 69.88s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 19:06:37,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.20 | bwd_microstep: 5239.59 | bwd_inner_microstep: 5220.56 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3967 [2024-07-31 19:06:46,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3798.68 | bwd_microstep: 5185.80 | bwd_inner_microstep: 5166.47 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3767 [2024-07-31 19:06:55,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.32 | bwd_microstep: 5153.46 | bwd_inner_microstep: 5112.46 | bwd_allreduce_microstep: 40.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 19:07:04,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.78 | bwd_microstep: 4969.11 | bwd_inner_microstep: 4938.27 | bwd_allreduce_microstep: 30.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 19:07:12,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.86 | bwd_microstep: 5070.25 | bwd_inner_microstep: 5003.35 | bwd_allreduce_microstep: 66.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-31 19:07:21,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.32 | bwd_microstep: 4988.21 | bwd_inner_microstep: 4968.79 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3719 [2024-07-31 19:07:30,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.17 | bwd_microstep: 5105.56 | bwd_inner_microstep: 5038.21 | bwd_allreduce_microstep: 67.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 19:07:39,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 19:07:39,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.46 | bwd_microstep: 5170.65 | bwd_inner_microstep: 4768.74 | bwd_allreduce_microstep: 401.84 | step_microstep: 182.04 [2024-07-31 19:07:39,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29279.68 | bwd: 40882.62 | bwd_inner: 40216.78 | bwd_allreduce: 665.34 | step: 182.62 70%|███████ | 861/1230 [16:55:45<7:10:53, 70.06s/it] {'loss': 1.1482, 'learning_rate': 4.3611022878877015e-06, 'epoch': 0.7} 70%|███████ | 861/1230 [16:55:45<7:10:53, 70.06s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3576 [2024-07-31 19:07:48,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.95 | bwd_microstep: 5175.37 | bwd_inner_microstep: 5094.76 | bwd_allreduce_microstep: 80.54 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-07-31 19:07:57,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.90 | bwd_microstep: 5159.16 | bwd_inner_microstep: 5125.03 | bwd_allreduce_microstep: 34.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-31 19:08:05,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.12 | bwd_microstep: 5041.35 | bwd_inner_microstep: 5021.98 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 19:08:14,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.45 | bwd_microstep: 5144.73 | bwd_inner_microstep: 5090.52 | bwd_allreduce_microstep: 54.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-31 19:08:22,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3190.41 | bwd_microstep: 4700.11 | bwd_inner_microstep: 4674.73 | bwd_allreduce_microstep: 25.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 19:08:31,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.16 | bwd_microstep: 5105.37 | bwd_inner_microstep: 4708.45 | bwd_allreduce_microstep: 396.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 19:08:39,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.48 | bwd_microstep: 4888.62 | bwd_inner_microstep: 4866.48 | bwd_allreduce_microstep: 22.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-31 19:08:48,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.81 [2024-07-31 19:08:48,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.17 | bwd_microstep: 4986.41 | bwd_inner_microstep: 4967.00 | bwd_allreduce_microstep: 19.34 | step_microstep: 181.74 [2024-07-31 19:08:48,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28876.54 | bwd: 40201.10 | bwd_inner: 39548.90 | bwd_allreduce: 651.71 | step: 182.32 70%|███████ | 862/1230 [16:56:54<7:08:31, 69.87s/it] {'loss': 1.1408, 'learning_rate': 4.339374299770473e-06, 'epoch': 0.7} 70%|███████ | 862/1230 [16:56:54<7:08:31, 69.87s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4085 [2024-07-31 19:08:57,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.39 | bwd_microstep: 5196.32 | bwd_inner_microstep: 5177.19 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-07-31 19:09:06,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.60 | bwd_microstep: 5201.41 | bwd_inner_microstep: 5149.79 | bwd_allreduce_microstep: 51.56 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2842 [2024-07-31 19:09:15,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.02 | bwd_microstep: 5267.17 | bwd_inner_microstep: 4859.18 | bwd_allreduce_microstep: 407.92 | step_microstep: 0.19 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3752 [2024-07-31 19:09:24,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.11 | bwd_microstep: 5142.58 | bwd_inner_microstep: 5072.08 | bwd_allreduce_microstep: 70.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 19:09:32,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.92 | bwd_microstep: 5167.29 | bwd_inner_microstep: 5086.51 | bwd_allreduce_microstep: 80.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 19:09:41,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.95 | bwd_microstep: 5163.80 | bwd_inner_microstep: 5086.11 | bwd_allreduce_microstep: 77.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 19:09:50,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.43 | bwd_microstep: 4982.96 | bwd_inner_microstep: 4935.99 | bwd_allreduce_microstep: 46.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 19:09:59,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 19:09:59,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.62 | bwd_microstep: 4880.18 | bwd_inner_microstep: 4860.17 | bwd_allreduce_microstep: 19.94 | step_microstep: 181.82 [2024-07-31 19:09:59,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29089.96 | bwd: 41001.70 | bwd_inner: 40226.95 | bwd_allreduce: 774.26 | step: 182.52 70%|███████ | 863/1230 [16:58:05<7:08:22, 70.03s/it] {'loss': 1.1513, 'learning_rate': 4.317685565574413e-06, 'epoch': 0.7} 70%|███████ | 863/1230 [16:58:05<7:08:22, 70.03s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2456 [2024-07-31 19:10:08,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.60 | bwd_microstep: 5336.50 | bwd_inner_microstep: 4924.59 | bwd_allreduce_microstep: 411.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2500 [2024-07-31 19:10:16,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.81 | bwd_microstep: 5274.61 | bwd_inner_microstep: 4864.45 | bwd_allreduce_microstep: 410.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3792 [2024-07-31 19:10:25,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.08 | bwd_microstep: 5022.04 | bwd_inner_microstep: 5002.66 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3782 [2024-07-31 19:10:34,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3340.42 | bwd_microstep: 5058.29 | bwd_inner_microstep: 4993.55 | bwd_allreduce_microstep: 64.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-31 19:10:42,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.98 | bwd_microstep: 5175.20 | bwd_inner_microstep: 4772.45 | bwd_allreduce_microstep: 402.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-31 19:10:51,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.07 | bwd_microstep: 4872.86 | bwd_inner_microstep: 4853.51 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 19:11:00,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.49 | bwd_microstep: 5093.74 | bwd_inner_microstep: 5026.29 | bwd_allreduce_microstep: 67.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 19:11:09,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 19:11:09,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.77 | bwd_microstep: 5134.07 | bwd_inner_microstep: 4737.82 | bwd_allreduce_microstep: 396.17 | step_microstep: 181.71 [2024-07-31 19:11:09,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28653.14 | bwd: 40967.28 | bwd_inner: 39175.26 | bwd_allreduce: 1791.53 | step: 182.30 70%|███████ | 864/1230 [16:59:14<7:07:03, 70.01s/it] {'loss': 1.1312, 'learning_rate': 4.296036235701235e-06, 'epoch': 0.7} 70%|███████ | 864/1230 [16:59:14<7:07:03, 70.01s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 19:11:18,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3878.39 | bwd_microstep: 5388.37 | bwd_inner_microstep: 5363.89 | bwd_allreduce_microstep: 24.42 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3584 [2024-07-31 19:11:26,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3407.75 | bwd_microstep: 4892.68 | bwd_inner_microstep: 4855.38 | bwd_allreduce_microstep: 37.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 19:11:35,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.34 | bwd_microstep: 5223.60 | bwd_inner_microstep: 4818.00 | bwd_allreduce_microstep: 405.54 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3295 [2024-07-31 19:11:44,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.27 | bwd_microstep: 5229.26 | bwd_inner_microstep: 5021.04 | bwd_allreduce_microstep: 208.15 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3753 [2024-07-31 19:11:52,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3461.78 | bwd_microstep: 4893.88 | bwd_inner_microstep: 4874.55 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-31 19:12:01,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.34 | bwd_microstep: 5115.25 | bwd_inner_microstep: 5048.66 | bwd_allreduce_microstep: 66.53 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1123 [2024-07-31 19:12:10,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.07 | bwd_microstep: 5200.26 | bwd_inner_microstep: 4798.87 | bwd_allreduce_microstep: 401.32 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2143 [2024-07-31 19:12:18,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 19:12:18,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.04 | bwd_microstep: 5059.40 | bwd_inner_microstep: 4667.05 | bwd_allreduce_microstep: 392.29 | step_microstep: 208.55 [2024-07-31 19:12:18,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28530.87 | bwd: 41002.70 | bwd_inner: 39447.37 | bwd_allreduce: 1554.84 | step: 209.13 70%|███████ | 865/1230 [17:00:24<7:05:40, 69.97s/it] {'loss': 1.1597, 'learning_rate': 4.274426460279412e-06, 'epoch': 0.7} 70%|███████ | 865/1230 [17:00:24<7:05:40, 69.97s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-31 19:12:27,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3793.73 | bwd_microstep: 5169.88 | bwd_inner_microstep: 5128.57 | bwd_allreduce_microstep: 41.24 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2349 [2024-07-31 19:12:36,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.33 | bwd_microstep: 5285.67 | bwd_inner_microstep: 4875.40 | bwd_allreduce_microstep: 410.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3829 [2024-07-31 19:12:45,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.31 | bwd_microstep: 5050.16 | bwd_inner_microstep: 5030.91 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2061 [2024-07-31 19:12:54,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.65 | bwd_microstep: 5155.80 | bwd_inner_microstep: 4755.13 | bwd_allreduce_microstep: 400.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 19:13:03,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.08 | bwd_microstep: 5165.26 | bwd_inner_microstep: 5107.52 | bwd_allreduce_microstep: 57.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 19:13:11,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.43 | bwd_microstep: 4986.25 | bwd_inner_microstep: 4935.65 | bwd_allreduce_microstep: 50.54 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3670 [2024-07-31 19:13:20,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.15 | bwd_microstep: 4913.49 | bwd_inner_microstep: 4890.66 | bwd_allreduce_microstep: 22.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 19:13:29,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 19:13:29,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.44 | bwd_microstep: 5007.99 | bwd_inner_microstep: 4954.26 | bwd_allreduce_microstep: 53.66 | step_microstep: 182.08 [2024-07-31 19:13:29,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28995.99 | bwd: 40734.47 | bwd_inner: 39678.03 | bwd_allreduce: 1055.96 | step: 182.67 70%|███████ | 866/1230 [17:01:34<7:04:40, 70.00s/it] {'loss': 1.1811, 'learning_rate': 4.252856389163128e-06, 'epoch': 0.7} 70%|███████ | 866/1230 [17:01:34<7:04:40, 70.00s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4050 [2024-07-31 19:13:38,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3875.10 | bwd_microstep: 5323.68 | bwd_inner_microstep: 5304.58 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3816 [2024-07-31 19:13:46,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.72 | bwd_microstep: 5105.48 | bwd_inner_microstep: 5064.08 | bwd_allreduce_microstep: 41.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3831 [2024-07-31 19:13:55,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3258.62 | bwd_microstep: 4898.45 | bwd_inner_microstep: 4873.24 | bwd_allreduce_microstep: 25.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 19:14:03,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.93 | bwd_microstep: 5265.36 | bwd_inner_microstep: 4856.69 | bwd_allreduce_microstep: 408.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-31 19:14:12,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.74 | bwd_microstep: 5116.29 | bwd_inner_microstep: 5049.56 | bwd_allreduce_microstep: 66.66 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 19:14:21,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.31 | bwd_microstep: 5195.86 | bwd_inner_microstep: 5115.17 | bwd_allreduce_microstep: 80.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3670 [2024-07-31 19:14:30,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.12 | bwd_microstep: 4921.74 | bwd_inner_microstep: 4893.28 | bwd_allreduce_microstep: 28.39 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 19:14:38,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 19:14:38,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3039.34 | bwd_microstep: 4926.36 | bwd_inner_microstep: 4550.21 | bwd_allreduce_microstep: 376.07 | step_microstep: 181.73 [2024-07-31 19:14:38,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28280.77 | bwd: 40753.20 | bwd_inner: 39706.75 | bwd_allreduce: 1045.96 | step: 182.43 70%|███████ | 867/1230 [17:02:44<7:02:21, 69.81s/it] {'loss': 1.1459, 'learning_rate': 4.231326171931231e-06, 'epoch': 0.7} 70%|███████ | 867/1230 [17:02:44<7:02:21, 69.81s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 19:14:48,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3884.90 | bwd_microstep: 5727.60 | bwd_inner_microstep: 5708.43 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2788 [2024-07-31 19:14:56,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.30 | bwd_microstep: 5209.36 | bwd_inner_microstep: 4803.53 | bwd_allreduce_microstep: 405.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3788 [2024-07-31 19:15:05,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.76 | bwd_microstep: 5313.48 | bwd_inner_microstep: 5239.17 | bwd_allreduce_microstep: 74.24 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-31 19:15:14,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.77 | bwd_microstep: 5164.17 | bwd_inner_microstep: 5085.98 | bwd_allreduce_microstep: 78.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 19:15:23,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.60 | bwd_microstep: 5331.06 | bwd_inner_microstep: 5229.49 | bwd_allreduce_microstep: 101.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-31 19:15:32,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.58 | bwd_microstep: 4981.91 | bwd_inner_microstep: 4962.53 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-07-31 19:15:40,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3463.77 | bwd_microstep: 5041.25 | bwd_inner_microstep: 4651.66 | bwd_allreduce_microstep: 389.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 19:15:49,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 19:15:49,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.98 | bwd_microstep: 5139.44 | bwd_inner_microstep: 5071.65 | bwd_allreduce_microstep: 67.72 | step_microstep: 181.85 [2024-07-31 19:15:49,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29163.56 | bwd: 41908.24 | bwd_inner: 40752.39 | bwd_allreduce: 1155.36 | step: 182.46 71%|███████ | 868/1230 [17:03:55<7:04:04, 70.29s/it] {'loss': 1.1522, 'learning_rate': 4.209835957886196e-06, 'epoch': 0.71} 71%|███████ | 868/1230 [17:03:55<7:04:04, 70.29s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3969 [2024-07-31 19:15:59,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.07 | bwd_microstep: 5598.75 | bwd_inner_microstep: 5520.97 | bwd_allreduce_microstep: 77.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3816 [2024-07-31 19:16:07,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.34 | bwd_microstep: 5037.94 | bwd_inner_microstep: 5018.60 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3784 [2024-07-31 19:16:16,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.21 | bwd_microstep: 5055.64 | bwd_inner_microstep: 5029.49 | bwd_allreduce_microstep: 26.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 19:16:25,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.19 | bwd_microstep: 5201.16 | bwd_inner_microstep: 5115.17 | bwd_allreduce_microstep: 85.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 19:16:34,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.78 | bwd_microstep: 5029.37 | bwd_inner_microstep: 5004.09 | bwd_allreduce_microstep: 25.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 19:16:43,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.91 | bwd_microstep: 5209.63 | bwd_inner_microstep: 5147.54 | bwd_allreduce_microstep: 62.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 19:16:51,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.75 | bwd_microstep: 4896.04 | bwd_inner_microstep: 4876.67 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 19:17:00,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 19:17:00,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.87 | bwd_microstep: 5024.31 | bwd_inner_microstep: 4970.86 | bwd_allreduce_microstep: 53.38 | step_microstep: 181.11 [2024-07-31 19:17:00,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29515.03 | bwd: 41052.82 | bwd_inner: 40683.35 | bwd_allreduce: 368.99 | step: 181.69 71%|███████ | 869/1230 [17:05:06<7:04:00, 70.47s/it] {'loss': 1.1482, 'learning_rate': 4.188385896053098e-06, 'epoch': 0.71} 71%|███████ | 869/1230 [17:05:06<7:04:00, 70.47s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2354 [2024-07-31 19:17:09,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.47 | bwd_microstep: 5343.67 | bwd_inner_microstep: 4932.54 | bwd_allreduce_microstep: 411.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3574 [2024-07-31 19:17:17,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.98 | bwd_microstep: 4918.74 | bwd_inner_microstep: 4858.98 | bwd_allreduce_microstep: 59.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-31 19:17:25,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.49 | bwd_microstep: 4836.22 | bwd_inner_microstep: 4809.25 | bwd_allreduce_microstep: 26.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 19:17:33,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3176.61 | bwd_microstep: 4706.70 | bwd_inner_microstep: 4684.12 | bwd_allreduce_microstep: 22.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 19:17:42,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.64 | bwd_microstep: 5131.76 | bwd_inner_microstep: 5064.26 | bwd_allreduce_microstep: 67.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 19:17:51,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.11 | bwd_microstep: 4911.50 | bwd_inner_microstep: 4892.11 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 19:17:59,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.05 | bwd_microstep: 5090.71 | bwd_inner_microstep: 4698.02 | bwd_allreduce_microstep: 392.63 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-31 19:18:08,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.78 [2024-07-31 19:18:08,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.23 | bwd_microstep: 5017.04 | bwd_inner_microstep: 4996.74 | bwd_allreduce_microstep: 20.22 | step_microstep: 183.07 [2024-07-31 19:18:08,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27817.48 | bwd: 39956.33 | bwd_inner: 38935.98 | bwd_allreduce: 1019.86 | step: 183.66 71%|███████ | 870/1230 [17:06:14<6:58:34, 69.76s/it] {'loss': 1.1396, 'learning_rate': 4.166976135178575e-06, 'epoch': 0.71} 71%|███████ | 870/1230 [17:06:14<6:58:34, 69.76s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4082 [2024-07-31 19:18:18,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3857.65 | bwd_microstep: 5385.92 | bwd_inner_microstep: 5366.81 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-31 19:18:26,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.71 | bwd_microstep: 5194.36 | bwd_inner_microstep: 5106.84 | bwd_allreduce_microstep: 87.45 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 19:18:35,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.58 | bwd_microstep: 5245.65 | bwd_inner_microstep: 5161.18 | bwd_allreduce_microstep: 84.40 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2226 [2024-07-31 19:18:44,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.50 | bwd_microstep: 5169.75 | bwd_inner_microstep: 4767.96 | bwd_allreduce_microstep: 401.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-31 19:18:53,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.55 | bwd_microstep: 5037.82 | bwd_inner_microstep: 5014.42 | bwd_allreduce_microstep: 23.34 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2084 [2024-07-31 19:19:02,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.00 | bwd_microstep: 5160.06 | bwd_inner_microstep: 4757.85 | bwd_allreduce_microstep: 402.14 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 19:19:10,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.23 | bwd_microstep: 5158.77 | bwd_inner_microstep: 4757.42 | bwd_allreduce_microstep: 401.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 19:19:19,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-31 19:19:19,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3795.83 | bwd_microstep: 5042.00 | bwd_inner_microstep: 5013.39 | bwd_allreduce_microstep: 28.55 | step_microstep: 182.38 [2024-07-31 19:19:19,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29220.96 | bwd: 41394.32 | bwd_inner: 39945.82 | bwd_allreduce: 1448.02 | step: 182.99 71%|███████ | 871/1230 [17:07:25<6:59:32, 70.12s/it] {'loss': 1.1558, 'learning_rate': 4.1456068237297964e-06, 'epoch': 0.71} 71%|███████ | 871/1230 [17:07:25<6:59:32, 70.12s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3974 [2024-07-31 19:19:29,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.03 | bwd_microstep: 5604.63 | bwd_inner_microstep: 5531.31 | bwd_allreduce_microstep: 73.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3576 [2024-07-31 19:19:38,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.52 | bwd_microstep: 5301.22 | bwd_inner_microstep: 5197.38 | bwd_allreduce_microstep: 103.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-31 19:19:46,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.71 | bwd_microstep: 5125.20 | bwd_inner_microstep: 5082.53 | bwd_allreduce_microstep: 42.61 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3699 [2024-07-31 19:19:54,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3086.28 | bwd_microstep: 4728.97 | bwd_inner_microstep: 4696.47 | bwd_allreduce_microstep: 32.44 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 19:20:03,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.04 | bwd_microstep: 5088.33 | bwd_inner_microstep: 5043.66 | bwd_allreduce_microstep: 44.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-31 19:20:12,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.12 | bwd_microstep: 5175.07 | bwd_inner_microstep: 4773.46 | bwd_allreduce_microstep: 401.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 19:20:20,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3028.30 | bwd_microstep: 4919.91 | bwd_inner_microstep: 4541.56 | bwd_allreduce_microstep: 378.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 19:20:28,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 19:20:28,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.68 | bwd_microstep: 5034.50 | bwd_inner_microstep: 4978.76 | bwd_allreduce_microstep: 55.67 | step_microstep: 181.69 [2024-07-31 19:20:28,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27776.58 | bwd: 40977.82 | bwd_inner: 39845.06 | bwd_allreduce: 1132.29 | step: 182.28 71%|███████ | 872/1230 [17:08:34<6:56:32, 69.81s/it] {'loss': 1.1432, 'learning_rate': 4.124278109893432e-06, 'epoch': 0.71} 71%|███████ | 872/1230 [17:08:34<6:56:32, 69.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2485 [2024-07-31 19:20:37,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.83 | bwd_microstep: 5339.12 | bwd_inner_microstep: 4927.25 | bwd_allreduce_microstep: 411.81 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2252 [2024-07-31 19:20:46,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.10 | bwd_microstep: 5379.64 | bwd_inner_microstep: 4962.97 | bwd_allreduce_microstep: 416.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3565 [2024-07-31 19:20:55,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.39 | bwd_microstep: 5205.29 | bwd_inner_microstep: 5112.75 | bwd_allreduce_microstep: 92.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-31 19:21:03,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.55 | bwd_microstep: 4957.44 | bwd_inner_microstep: 4573.84 | bwd_allreduce_microstep: 383.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-31 19:21:12,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.64 | bwd_microstep: 5169.18 | bwd_inner_microstep: 4768.46 | bwd_allreduce_microstep: 400.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 19:21:21,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.11 | bwd_microstep: 5112.67 | bwd_inner_microstep: 5042.31 | bwd_allreduce_microstep: 70.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 19:21:29,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.59 | bwd_microstep: 5060.90 | bwd_inner_microstep: 5002.75 | bwd_allreduce_microstep: 58.07 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3163 [2024-07-31 19:21:38,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 19:21:38,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.28 | bwd_microstep: 5169.86 | bwd_inner_microstep: 4889.60 | bwd_allreduce_microstep: 280.19 | step_microstep: 181.72 [2024-07-31 19:21:38,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28130.40 | bwd: 41394.08 | bwd_inner: 39279.86 | bwd_allreduce: 2113.73 | step: 182.30 71%|███████ | 873/1230 [17:09:44<6:55:27, 69.82s/it] {'loss': 1.1315, 'learning_rate': 4.10299014157462e-06, 'epoch': 0.71} 71%|███████ | 873/1230 [17:09:44<6:55:27, 69.82s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2395 [2024-07-31 19:21:47,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.95 | bwd_microstep: 5370.74 | bwd_inner_microstep: 4959.31 | bwd_allreduce_microstep: 411.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-31 19:21:56,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.45 | bwd_microstep: 5382.50 | bwd_inner_microstep: 4967.60 | bwd_allreduce_microstep: 414.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-31 19:22:05,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.65 | bwd_microstep: 5021.47 | bwd_inner_microstep: 4994.56 | bwd_allreduce_microstep: 26.84 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3736 [2024-07-31 19:22:14,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.69 | bwd_microstep: 5250.36 | bwd_inner_microstep: 5164.64 | bwd_allreduce_microstep: 85.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 19:22:23,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.79 | bwd_microstep: 5221.97 | bwd_inner_microstep: 5131.14 | bwd_allreduce_microstep: 90.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 19:22:32,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.93 | bwd_microstep: 5223.03 | bwd_inner_microstep: 4818.78 | bwd_allreduce_microstep: 404.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 19:22:40,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.02 | bwd_microstep: 5051.62 | bwd_inner_microstep: 4658.61 | bwd_allreduce_microstep: 392.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 19:22:49,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 19:22:49,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.56 | bwd_microstep: 5003.26 | bwd_inner_microstep: 4952.38 | bwd_allreduce_microstep: 50.81 | step_microstep: 181.62 [2024-07-31 19:22:49,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28814.94 | bwd: 41524.94 | bwd_inner: 39646.97 | bwd_allreduce: 1877.49 | step: 182.19 71%|███████ | 874/1230 [17:10:55<6:55:47, 70.08s/it] {'loss': 1.1128, 'learning_rate': 4.0817430663959536e-06, 'epoch': 0.71} 71%|███████ | 874/1230 [17:10:55<6:55:47, 70.08s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-31 19:22:58,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.67 | bwd_microstep: 5132.22 | bwd_inner_microstep: 5058.28 | bwd_allreduce_microstep: 73.88 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-31 19:23:06,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3207.28 | bwd_microstep: 5328.93 | bwd_inner_microstep: 5201.54 | bwd_allreduce_microstep: 127.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3892 [2024-07-31 19:23:15,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.84 | bwd_microstep: 4976.58 | bwd_inner_microstep: 4954.65 | bwd_allreduce_microstep: 21.86 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-31 19:23:24,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.75 | bwd_microstep: 5136.90 | bwd_inner_microstep: 5064.76 | bwd_allreduce_microstep: 72.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3780 [2024-07-31 19:23:32,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.43 | bwd_microstep: 5159.15 | bwd_inner_microstep: 5083.58 | bwd_allreduce_microstep: 75.50 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-31 19:23:41,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.33 | bwd_microstep: 4974.35 | bwd_inner_microstep: 4940.93 | bwd_allreduce_microstep: 33.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 19:23:50,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.72 | bwd_microstep: 5069.15 | bwd_inner_microstep: 5008.95 | bwd_allreduce_microstep: 60.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3674 [2024-07-31 19:23:58,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.81 [2024-07-31 19:23:58,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.14 | bwd_microstep: 4920.67 | bwd_inner_microstep: 4896.66 | bwd_allreduce_microstep: 23.93 | step_microstep: 182.16 [2024-07-31 19:23:58,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28547.05 | bwd: 40697.94 | bwd_inner: 40209.28 | bwd_allreduce: 488.16 | step: 182.76 71%|███████ | 875/1230 [17:12:04<6:53:44, 69.93s/it] {'loss': 1.1229, 'learning_rate': 4.060537031696446e-06, 'epoch': 0.71} 71%|███████ | 875/1230 [17:12:04<6:53:44, 69.93s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3900 [2024-07-31 19:24:07,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3447.13 | bwd_microstep: 5040.98 | bwd_inner_microstep: 5015.12 | bwd_allreduce_microstep: 25.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3882 [2024-07-31 19:24:16,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3797.32 | bwd_microstep: 5126.31 | bwd_inner_microstep: 5107.01 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3908 [2024-07-31 19:24:25,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3792.61 | bwd_microstep: 5184.66 | bwd_inner_microstep: 5165.36 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-31 19:24:34,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.62 | bwd_microstep: 5120.89 | bwd_inner_microstep: 5051.15 | bwd_allreduce_microstep: 69.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3789 [2024-07-31 19:24:42,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.53 | bwd_microstep: 5113.82 | bwd_inner_microstep: 5069.21 | bwd_allreduce_microstep: 44.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 19:24:51,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.28 | bwd_microstep: 5071.38 | bwd_inner_microstep: 5017.10 | bwd_allreduce_microstep: 54.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 19:25:00,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.70 | bwd_microstep: 5045.39 | bwd_inner_microstep: 4988.85 | bwd_allreduce_microstep: 56.47 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-31 19:25:08,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 19:25:08,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.05 | bwd_microstep: 5080.71 | bwd_inner_microstep: 4686.90 | bwd_allreduce_microstep: 393.75 | step_microstep: 181.73 [2024-07-31 19:25:08,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28837.16 | bwd: 40784.11 | bwd_inner: 40100.64 | bwd_allreduce: 682.98 | step: 182.43 71%|███████ | 876/1230 [17:13:14<6:52:37, 69.94s/it] {'loss': 1.1211, 'learning_rate': 4.039372184530521e-06, 'epoch': 0.71} 71%|███████ | 876/1230 [17:13:14<6:52:37, 69.94s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2399 [2024-07-31 19:25:17,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3090.82 | bwd_microstep: 5107.25 | bwd_inner_microstep: 4714.83 | bwd_allreduce_microstep: 392.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4030 [2024-07-31 19:25:26,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3828.12 | bwd_microstep: 5265.35 | bwd_inner_microstep: 5246.11 | bwd_allreduce_microstep: 19.17 | step_microstep: 0.10 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3764 [2024-07-31 19:25:34,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.33 | bwd_microstep: 4909.29 | bwd_inner_microstep: 4889.82 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-31 19:25:43,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.26 | bwd_microstep: 4985.55 | bwd_inner_microstep: 4966.14 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-31 19:25:52,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.44 | bwd_microstep: 4999.00 | bwd_inner_microstep: 4977.10 | bwd_allreduce_microstep: 21.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-31 19:26:00,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3269.45 | bwd_microstep: 4732.49 | bwd_inner_microstep: 4709.99 | bwd_allreduce_microstep: 22.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2153 [2024-07-31 19:26:08,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.99 | bwd_microstep: 5102.18 | bwd_inner_microstep: 4705.57 | bwd_allreduce_microstep: 396.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 19:26:16,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 19:26:16,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3016.04 | bwd_microstep: 4917.35 | bwd_inner_microstep: 4540.72 | bwd_allreduce_microstep: 376.55 | step_microstep: 181.32 [2024-07-31 19:26:16,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27716.36 | bwd: 40018.44 | bwd_inner: 38750.22 | bwd_allreduce: 1267.72 | step: 181.91 71%|███████▏ | 877/1230 [17:14:22<6:48:09, 69.37s/it] {'loss': 1.169, 'learning_rate': 4.018248671666969e-06, 'epoch': 0.71} 71%|███████▏ | 877/1230 [17:14:22<6:48:09, 69.37s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3810 [2024-07-31 19:26:26,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.48 | bwd_microstep: 5343.27 | bwd_inner_microstep: 5275.92 | bwd_allreduce_microstep: 67.28 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 19:26:34,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.44 | bwd_microstep: 5284.37 | bwd_inner_microstep: 5191.83 | bwd_allreduce_microstep: 92.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-31 19:26:43,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.74 | bwd_microstep: 5160.26 | bwd_inner_microstep: 5121.33 | bwd_allreduce_microstep: 38.87 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3778 [2024-07-31 19:26:52,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.95 | bwd_microstep: 5120.59 | bwd_inner_microstep: 5061.16 | bwd_allreduce_microstep: 59.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 19:27:01,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.27 | bwd_microstep: 5237.14 | bwd_inner_microstep: 5152.93 | bwd_allreduce_microstep: 84.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-31 19:27:09,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3356.10 | bwd_microstep: 4912.74 | bwd_inner_microstep: 4885.46 | bwd_allreduce_microstep: 27.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-31 19:27:18,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3218.08 | bwd_microstep: 5011.94 | bwd_inner_microstep: 4624.28 | bwd_allreduce_microstep: 387.59 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3688 [2024-07-31 19:27:27,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 19:27:27,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.86 | bwd_microstep: 5172.89 | bwd_inner_microstep: 5087.90 | bwd_allreduce_microstep: 84.93 | step_microstep: 182.13 [2024-07-31 19:27:27,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28526.83 | bwd: 41243.19 | bwd_inner: 40400.76 | bwd_allreduce: 841.96 | step: 182.71 71%|███████▏ | 878/1230 [17:15:32<6:48:17, 69.59s/it] {'loss': 1.1072, 'learning_rate': 3.9971666395879605e-06, 'epoch': 0.71} 71%|███████▏ | 878/1230 [17:15:32<6:48:17, 69.59s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2335 [2024-07-31 19:27:36,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.00 | bwd_microstep: 5647.70 | bwd_inner_microstep: 5223.84 | bwd_allreduce_microstep: 423.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3840 [2024-07-31 19:27:45,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3792.83 | bwd_microstep: 5089.92 | bwd_inner_microstep: 5066.23 | bwd_allreduce_microstep: 23.63 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2058 [2024-07-31 19:27:54,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.26 | bwd_microstep: 5417.20 | bwd_inner_microstep: 4998.38 | bwd_allreduce_microstep: 418.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-31 19:28:02,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.42 | bwd_microstep: 4865.33 | bwd_inner_microstep: 4837.98 | bwd_allreduce_microstep: 27.28 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2255 [2024-07-31 19:28:11,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.90 | bwd_microstep: 5256.29 | bwd_inner_microstep: 4849.56 | bwd_allreduce_microstep: 406.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-31 19:28:20,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.35 | bwd_microstep: 5025.35 | bwd_inner_microstep: 4998.95 | bwd_allreduce_microstep: 26.33 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3658 [2024-07-31 19:28:28,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.83 | bwd_microstep: 5080.86 | bwd_inner_microstep: 4995.95 | bwd_allreduce_microstep: 84.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 19:28:37,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 19:28:37,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.13 | bwd_microstep: 5038.14 | bwd_inner_microstep: 4998.39 | bwd_allreduce_microstep: 39.69 | step_microstep: 183.75 [2024-07-31 19:28:37,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28812.63 | bwd: 41420.77 | bwd_inner: 39969.23 | bwd_allreduce: 1451.06 | step: 184.35 71%|███████▏ | 879/1230 [17:16:43<6:48:49, 69.89s/it] {'loss': 1.1476, 'learning_rate': 3.9761262344880096e-06, 'epoch': 0.71} 71%|███████▏ | 879/1230 [17:16:43<6:48:49, 69.89s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 4042 [2024-07-31 19:28:46,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.17 | bwd_microstep: 5325.51 | bwd_inner_microstep: 5276.55 | bwd_allreduce_microstep: 48.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3805 [2024-07-31 19:28:55,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3785.15 | bwd_microstep: 5045.44 | bwd_inner_microstep: 5026.04 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3614 [2024-07-31 19:29:04,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.36 | bwd_microstep: 5261.00 | bwd_inner_microstep: 5144.04 | bwd_allreduce_microstep: 116.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 19:29:13,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.44 | bwd_microstep: 5173.81 | bwd_inner_microstep: 5088.85 | bwd_allreduce_microstep: 84.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 19:29:21,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3053.29 | bwd_microstep: 5011.96 | bwd_inner_microstep: 4623.60 | bwd_allreduce_microstep: 388.29 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-31 19:29:30,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.34 | bwd_microstep: 5037.23 | bwd_inner_microstep: 5010.64 | bwd_allreduce_microstep: 26.52 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3716 [2024-07-31 19:29:38,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3149.42 | bwd_microstep: 4933.01 | bwd_inner_microstep: 4896.99 | bwd_allreduce_microstep: 35.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 19:29:46,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 19:29:46,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3006.87 | bwd_microstep: 4878.30 | bwd_inner_microstep: 4504.02 | bwd_allreduce_microstep: 374.22 | step_microstep: 181.86 [2024-07-31 19:29:46,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27678.93 | bwd: 40666.23 | bwd_inner: 39570.67 | bwd_allreduce: 1095.07 | step: 182.45 72%|███████▏ | 880/1230 [17:17:52<6:45:32, 69.52s/it] {'loss': 1.0853, 'learning_rate': 3.9551276022729644e-06, 'epoch': 0.72} 72%|███████▏ | 880/1230 [17:17:52<6:45:32, 69.52s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4023 [2024-07-31 19:29:55,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.43 | bwd_microstep: 5079.49 | bwd_inner_microstep: 5060.35 | bwd_allreduce_microstep: 19.07 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3897 [2024-07-31 19:30:03,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.18 | bwd_microstep: 5085.51 | bwd_inner_microstep: 5050.03 | bwd_allreduce_microstep: 35.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 19:30:11,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3017.98 | bwd_microstep: 4982.54 | bwd_inner_microstep: 4602.32 | bwd_allreduce_microstep: 380.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 19:30:20,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.88 | bwd_microstep: 5105.45 | bwd_inner_microstep: 5037.31 | bwd_allreduce_microstep: 68.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 19:30:29,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.72 | bwd_microstep: 5129.92 | bwd_inner_microstep: 5057.27 | bwd_allreduce_microstep: 72.59 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3635 [2024-07-31 19:30:37,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.58 | bwd_microstep: 5053.06 | bwd_inner_microstep: 4970.85 | bwd_allreduce_microstep: 82.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 19:30:46,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.62 | bwd_microstep: 4928.61 | bwd_inner_microstep: 4904.81 | bwd_allreduce_microstep: 23.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 19:30:55,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 19:30:55,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.34 | bwd_microstep: 4878.01 | bwd_inner_microstep: 4858.53 | bwd_allreduce_microstep: 19.41 | step_microstep: 181.72 [2024-07-31 19:30:55,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28467.63 | bwd: 40242.56 | bwd_inner: 39541.41 | bwd_allreduce: 700.67 | step: 182.31 72%|███████▏ | 881/1230 [17:19:01<6:43:33, 69.38s/it] {'loss': 1.1687, 'learning_rate': 3.934170888559e-06, 'epoch': 0.72} 72%|███████▏ | 881/1230 [17:19:01<6:43:33, 69.38s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 19:31:05,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3867.52 | bwd_microstep: 5745.20 | bwd_inner_microstep: 5726.18 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3858 [2024-07-31 19:31:13,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.46 | bwd_microstep: 5045.03 | bwd_inner_microstep: 5009.02 | bwd_allreduce_microstep: 35.94 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3760 [2024-07-31 19:31:22,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.67 | bwd_microstep: 5187.56 | bwd_inner_microstep: 5121.40 | bwd_allreduce_microstep: 66.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 19:31:30,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3353.87 | bwd_microstep: 5064.42 | bwd_inner_microstep: 4999.25 | bwd_allreduce_microstep: 65.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 19:31:39,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.42 | bwd_microstep: 5145.07 | bwd_inner_microstep: 5071.32 | bwd_allreduce_microstep: 73.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-31 19:31:48,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.46 | bwd_microstep: 5142.23 | bwd_inner_microstep: 5086.92 | bwd_allreduce_microstep: 55.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 19:31:57,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3450.32 | bwd_microstep: 5037.70 | bwd_inner_microstep: 4648.99 | bwd_allreduce_microstep: 388.64 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2147 [2024-07-31 19:32:05,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 19:32:05,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3440.69 | bwd_microstep: 5030.62 | bwd_inner_microstep: 4641.11 | bwd_allreduce_microstep: 389.45 | step_microstep: 183.12 [2024-07-31 19:32:05,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28590.32 | bwd: 41397.81 | bwd_inner: 40304.13 | bwd_allreduce: 1093.20 | step: 183.70 72%|███████▏ | 882/1230 [17:20:11<6:44:02, 69.66s/it] {'loss': 1.1539, 'learning_rate': 3.913256238671607e-06, 'epoch': 0.72} 72%|███████▏ | 882/1230 [17:20:11<6:44:02, 69.66s/it]dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2784 [2024-07-31 19:32:14,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.23 | bwd_microstep: 5561.88 | bwd_inner_microstep: 5132.80 | bwd_allreduce_microstep: 429.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-31 19:32:24,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3820.86 | bwd_microstep: 5343.86 | bwd_inner_microstep: 5282.39 | bwd_allreduce_microstep: 61.41 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3779 [2024-07-31 19:32:33,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.68 | bwd_microstep: 5334.84 | bwd_inner_microstep: 5272.72 | bwd_allreduce_microstep: 62.04 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3641 [2024-07-31 19:32:41,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.57 | bwd_microstep: 5130.11 | bwd_inner_microstep: 5070.19 | bwd_allreduce_microstep: 59.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 19:32:50,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.21 | bwd_microstep: 5103.69 | bwd_inner_microstep: 5060.09 | bwd_allreduce_microstep: 43.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 19:32:59,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.31 | bwd_microstep: 5030.15 | bwd_inner_microstep: 5003.84 | bwd_allreduce_microstep: 26.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3791 [2024-07-31 19:33:08,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.97 | bwd_microstep: 5015.01 | bwd_inner_microstep: 4982.88 | bwd_allreduce_microstep: 32.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 19:33:16,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 19:33:16,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.86 | bwd_microstep: 4877.10 | bwd_inner_microstep: 4857.75 | bwd_allreduce_microstep: 19.27 | step_microstep: 181.63 [2024-07-31 19:33:16,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29407.59 | bwd: 41396.63 | bwd_inner: 40662.60 | bwd_allreduce: 733.53 | step: 182.22 72%|███████▏ | 883/1230 [17:21:22<6:45:26, 70.11s/it] {'loss': 1.0958, 'learning_rate': 3.89238379764457e-06, 'epoch': 0.72} 72%|███████▏ | 883/1230 [17:21:22<6:45:26, 70.11s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3868 [2024-07-31 19:33:25,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3456.88 | bwd_microstep: 5286.78 | bwd_inner_microstep: 5235.03 | bwd_allreduce_microstep: 51.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2313 [2024-07-31 19:33:34,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.28 | bwd_microstep: 5103.29 | bwd_inner_microstep: 4706.68 | bwd_allreduce_microstep: 396.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3584 [2024-07-31 19:33:42,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.04 | bwd_microstep: 5138.80 | bwd_inner_microstep: 5056.10 | bwd_allreduce_microstep: 82.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-31 19:33:51,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.97 | bwd_microstep: 5118.79 | bwd_inner_microstep: 4720.42 | bwd_allreduce_microstep: 398.30 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 19:34:00,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.20 | bwd_microstep: 4916.79 | bwd_inner_microstep: 4891.34 | bwd_allreduce_microstep: 25.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 19:34:08,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.39 | bwd_microstep: 4960.44 | bwd_inner_microstep: 4914.19 | bwd_allreduce_microstep: 46.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 19:34:17,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.81 | bwd_microstep: 5037.87 | bwd_inner_microstep: 4986.42 | bwd_allreduce_microstep: 51.39 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3169 [2024-07-31 19:34:26,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 19:34:26,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.84 | bwd_microstep: 5066.82 | bwd_inner_microstep: 4907.29 | bwd_allreduce_microstep: 159.44 | step_microstep: 181.51 [2024-07-31 19:34:26,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28344.31 | bwd: 40629.57 | bwd_inner: 39417.41 | bwd_allreduce: 1211.67 | step: 182.10 72%|███████▏ | 884/1230 [17:22:32<6:42:53, 69.86s/it] {'loss': 1.1998, 'learning_rate': 3.871553710218988e-06, 'epoch': 0.72} 72%|███████▏ | 884/1230 [17:22:32<6:42:53, 69.86s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3836 [2024-07-31 19:34:35,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.90 | bwd_microstep: 5301.27 | bwd_inner_microstep: 5233.62 | bwd_allreduce_microstep: 67.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3811 [2024-07-31 19:34:44,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.93 | bwd_microstep: 5445.55 | bwd_inner_microstep: 5356.96 | bwd_allreduce_microstep: 88.52 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2718 [2024-07-31 19:34:53,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.30 | bwd_microstep: 5191.81 | bwd_inner_microstep: 4789.19 | bwd_allreduce_microstep: 402.55 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-31 19:35:01,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.19 | bwd_microstep: 5157.78 | bwd_inner_microstep: 5105.31 | bwd_allreduce_microstep: 52.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 19:35:10,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.81 | bwd_microstep: 5113.41 | bwd_inner_microstep: 5042.28 | bwd_allreduce_microstep: 71.06 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 19:35:19,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.82 | bwd_microstep: 5012.80 | bwd_inner_microstep: 4956.14 | bwd_allreduce_microstep: 56.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-31 19:35:27,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.81 | bwd_microstep: 4990.27 | bwd_inner_microstep: 4958.71 | bwd_allreduce_microstep: 31.49 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3214 [2024-07-31 19:35:36,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 19:35:36,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.47 | bwd_microstep: 5022.93 | bwd_inner_microstep: 4872.31 | bwd_allreduce_microstep: 150.55 | step_microstep: 181.54 [2024-07-31 19:35:36,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28788.15 | bwd: 41235.80 | bwd_inner: 40314.47 | bwd_allreduce: 920.85 | step: 182.23 72%|███████▏ | 885/1230 [17:23:42<6:42:34, 70.01s/it] {'loss': 1.1589, 'learning_rate': 3.850766120842252e-06, 'epoch': 0.72} 72%|███████▏ | 885/1230 [17:23:42<6:42:34, 70.01s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3546 [2024-07-31 19:35:45,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.35 | bwd_microstep: 5542.68 | bwd_inner_microstep: 5354.46 | bwd_allreduce_microstep: 188.16 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2130 [2024-07-31 19:35:54,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.31 | bwd_microstep: 5311.16 | bwd_inner_microstep: 4901.81 | bwd_allreduce_microstep: 409.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3794 [2024-07-31 19:36:03,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.40 | bwd_microstep: 5220.75 | bwd_inner_microstep: 5163.36 | bwd_allreduce_microstep: 57.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-07-31 19:36:12,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.49 | bwd_microstep: 5046.96 | bwd_inner_microstep: 5025.81 | bwd_allreduce_microstep: 21.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 19:36:21,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.79 | bwd_microstep: 5142.51 | bwd_inner_microstep: 5061.29 | bwd_allreduce_microstep: 81.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3769 [2024-07-31 19:36:29,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.00 | bwd_microstep: 5181.14 | bwd_inner_microstep: 5125.95 | bwd_allreduce_microstep: 55.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 19:36:38,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.04 | bwd_microstep: 5057.07 | bwd_inner_microstep: 4998.69 | bwd_allreduce_microstep: 58.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 19:36:47,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 19:36:47,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.01 | bwd_microstep: 4988.86 | bwd_inner_microstep: 4938.88 | bwd_allreduce_microstep: 49.91 | step_microstep: 181.50 [2024-07-31 19:36:47,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29024.30 | bwd: 41491.12 | bwd_inner: 40570.19 | bwd_allreduce: 920.45 | step: 182.08 72%|███████▏ | 886/1230 [17:24:53<6:42:50, 70.26s/it] {'loss': 1.1514, 'learning_rate': 3.830021173667048e-06, 'epoch': 0.72} 72%|███████▏ | 886/1230 [17:24:53<6:42:50, 70.26s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3862 [2024-07-31 19:36:56,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3857.11 | bwd_microstep: 5442.61 | bwd_inner_microstep: 5380.51 | bwd_allreduce_microstep: 62.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2301 [2024-07-31 19:37:05,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.46 | bwd_microstep: 5320.23 | bwd_inner_microstep: 4911.83 | bwd_allreduce_microstep: 408.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-07-31 19:37:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.90 | bwd_microstep: 5027.55 | bwd_inner_microstep: 5008.25 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-31 19:37:23,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.33 | bwd_microstep: 5230.17 | bwd_inner_microstep: 4824.68 | bwd_allreduce_microstep: 405.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3628 [2024-07-31 19:37:31,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.58 | bwd_microstep: 5134.56 | bwd_inner_microstep: 5044.22 | bwd_allreduce_microstep: 90.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-31 19:37:40,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.13 | bwd_microstep: 5055.33 | bwd_inner_microstep: 4992.60 | bwd_allreduce_microstep: 62.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 19:37:48,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3164.98 | bwd_microstep: 4695.20 | bwd_inner_microstep: 4671.86 | bwd_allreduce_microstep: 23.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3779 [2024-07-31 19:37:56,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 19:37:56,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3228.30 | bwd_microstep: 4842.54 | bwd_inner_microstep: 4823.22 | bwd_allreduce_microstep: 19.24 | step_microstep: 181.70 [2024-07-31 19:37:56,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28266.69 | bwd: 40748.17 | bwd_inner: 39657.11 | bwd_allreduce: 1090.58 | step: 182.27 72%|███████▏ | 887/1230 [17:26:02<6:40:05, 69.99s/it] {'loss': 1.1087, 'learning_rate': 3.809319012550352e-06, 'epoch': 0.72} 72%|███████▏ | 887/1230 [17:26:02<6:40:05, 69.99s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3554 [2024-07-31 19:38:05,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.07 | bwd_microstep: 5471.76 | bwd_inner_microstep: 5279.78 | bwd_allreduce_microstep: 191.91 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 19:38:14,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.99 | bwd_microstep: 5210.62 | bwd_inner_microstep: 5124.75 | bwd_allreduce_microstep: 85.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 19:38:23,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.59 | bwd_microstep: 5192.87 | bwd_inner_microstep: 5108.17 | bwd_allreduce_microstep: 84.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-31 19:38:32,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.37 | bwd_microstep: 5165.76 | bwd_inner_microstep: 5108.92 | bwd_allreduce_microstep: 56.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-31 19:38:40,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3246.80 | bwd_microstep: 4868.74 | bwd_inner_microstep: 4845.97 | bwd_allreduce_microstep: 22.70 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2167 [2024-07-31 19:38:49,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3464.19 | bwd_microstep: 5068.71 | bwd_inner_microstep: 4674.19 | bwd_allreduce_microstep: 394.46 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3717 [2024-07-31 19:38:57,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.45 | bwd_microstep: 4921.49 | bwd_inner_microstep: 4902.12 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 19:39:06,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 19:39:06,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.71 | bwd_microstep: 5010.63 | bwd_inner_microstep: 4976.30 | bwd_allreduce_microstep: 34.27 | step_microstep: 182.20 [2024-07-31 19:39:06,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28705.10 | bwd: 40910.57 | bwd_inner: 40020.13 | bwd_allreduce: 889.95 | step: 182.79 72%|███████▏ | 888/1230 [17:27:12<6:38:51, 69.97s/it] {'loss': 1.1364, 'learning_rate': 3.788659781052444e-06, 'epoch': 0.72} 72%|███████▏ | 888/1230 [17:27:12<6:38:51, 69.97s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3868 [2024-07-31 19:39:15,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.98 | bwd_microstep: 5579.97 | bwd_inner_microstep: 5477.88 | bwd_allreduce_microstep: 102.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3786 [2024-07-31 19:39:24,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.85 | bwd_microstep: 5177.69 | bwd_inner_microstep: 5126.62 | bwd_allreduce_microstep: 50.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 19:39:33,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.47 | bwd_microstep: 5213.06 | bwd_inner_microstep: 5149.10 | bwd_allreduce_microstep: 63.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-31 19:39:42,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.64 | bwd_microstep: 5066.98 | bwd_inner_microstep: 5036.77 | bwd_allreduce_microstep: 30.14 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-31 19:39:50,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3236.64 | bwd_microstep: 4871.25 | bwd_inner_microstep: 4823.84 | bwd_allreduce_microstep: 47.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-31 19:39:59,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.03 | bwd_microstep: 5039.49 | bwd_inner_microstep: 5017.07 | bwd_allreduce_microstep: 22.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 19:40:08,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.21 | bwd_microstep: 5077.15 | bwd_inner_microstep: 5011.08 | bwd_allreduce_microstep: 66.00 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2173 [2024-07-31 19:40:16,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 19:40:16,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.48 | bwd_microstep: 5081.98 | bwd_inner_microstep: 4687.45 | bwd_allreduce_microstep: 394.47 | step_microstep: 181.85 [2024-07-31 19:40:16,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28872.20 | bwd: 41107.55 | bwd_inner: 40329.76 | bwd_allreduce: 777.32 | step: 182.43 72%|███████▏ | 889/1230 [17:28:22<6:38:16, 70.08s/it] {'loss': 1.1737, 'learning_rate': 3.768043622435905e-06, 'epoch': 0.72} 72%|███████▏ | 889/1230 [17:28:22<6:38:16, 70.08s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3792 [2024-07-31 19:40:26,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.18 | bwd_microstep: 5616.09 | bwd_inner_microstep: 5437.94 | bwd_allreduce_microstep: 178.09 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2231 [2024-07-31 19:40:35,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.69 | bwd_microstep: 5335.04 | bwd_inner_microstep: 4921.36 | bwd_allreduce_microstep: 413.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-31 19:40:44,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.46 | bwd_microstep: 5004.03 | bwd_inner_microstep: 4980.53 | bwd_allreduce_microstep: 23.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 19:40:52,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.27 | bwd_microstep: 5206.73 | bwd_inner_microstep: 5119.91 | bwd_allreduce_microstep: 86.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 19:41:01,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.07 | bwd_microstep: 5008.33 | bwd_inner_microstep: 4972.95 | bwd_allreduce_microstep: 35.31 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 19:41:10,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.54 | bwd_microstep: 5136.71 | bwd_inner_microstep: 5068.39 | bwd_allreduce_microstep: 68.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 19:41:18,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.28 | bwd_microstep: 5069.53 | bwd_inner_microstep: 5025.69 | bwd_allreduce_microstep: 43.76 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 19:41:27,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 19:41:27,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3041.59 | bwd_microstep: 4915.66 | bwd_inner_microstep: 4535.78 | bwd_allreduce_microstep: 379.81 | step_microstep: 181.58 [2024-07-31 19:41:27,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28483.00 | bwd: 41292.08 | bwd_inner: 40062.47 | bwd_allreduce: 1229.12 | step: 182.18 72%|███████▏ | 890/1230 [17:29:32<6:37:08, 70.09s/it] {'loss': 1.1584, 'learning_rate': 3.7474706796646275e-06, 'epoch': 0.72} 72%|███████▏ | 890/1230 [17:29:32<6:37:08, 70.09s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 19:41:36,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3866.57 | bwd_microstep: 5332.65 | bwd_inner_microstep: 5313.65 | bwd_allreduce_microstep: 18.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3956 [2024-07-31 19:41:45,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3799.26 | bwd_microstep: 5187.60 | bwd_inner_microstep: 5168.25 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3797 [2024-07-31 19:41:53,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.25 | bwd_microstep: 4949.53 | bwd_inner_microstep: 4921.10 | bwd_allreduce_microstep: 28.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 19:42:02,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.40 | bwd_microstep: 5182.48 | bwd_inner_microstep: 5128.02 | bwd_allreduce_microstep: 54.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3684 [2024-07-31 19:42:11,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.98 | bwd_microstep: 5128.94 | bwd_inner_microstep: 5046.77 | bwd_allreduce_microstep: 82.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-31 19:42:20,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.35 | bwd_microstep: 5154.91 | bwd_inner_microstep: 5075.32 | bwd_allreduce_microstep: 79.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 19:42:28,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.38 | bwd_microstep: 4995.30 | bwd_inner_microstep: 4942.69 | bwd_allreduce_microstep: 52.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 19:42:37,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 19:42:37,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.43 | bwd_microstep: 5179.15 | bwd_inner_microstep: 5098.71 | bwd_allreduce_microstep: 80.37 | step_microstep: 181.21 [2024-07-31 19:42:37,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29259.53 | bwd: 41110.54 | bwd_inner: 40694.45 | bwd_allreduce: 415.61 | step: 181.79 72%|███████▏ | 891/1230 [17:30:43<6:37:01, 70.27s/it] {'loss': 1.1401, 'learning_rate': 3.7269410954028073e-06, 'epoch': 0.72} 72%|███████▏ | 891/1230 [17:30:43<6:37:01, 70.27s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2361 [2024-07-31 19:42:46,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.62 | bwd_microstep: 5296.31 | bwd_inner_microstep: 4888.28 | bwd_allreduce_microstep: 407.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-31 19:42:55,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.88 | bwd_microstep: 5210.15 | bwd_inner_microstep: 5127.51 | bwd_allreduce_microstep: 82.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3784 [2024-07-31 19:43:04,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.11 | bwd_microstep: 5036.30 | bwd_inner_microstep: 5016.96 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 19:43:13,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3780.41 | bwd_microstep: 5090.88 | bwd_inner_microstep: 5056.99 | bwd_allreduce_microstep: 33.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-31 19:43:21,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.56 | bwd_microstep: 5146.88 | bwd_inner_microstep: 4748.73 | bwd_allreduce_microstep: 398.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-31 19:43:30,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.35 | bwd_microstep: 5191.06 | bwd_inner_microstep: 4786.26 | bwd_allreduce_microstep: 404.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-31 19:43:39,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.87 | bwd_microstep: 4992.83 | bwd_inner_microstep: 4938.91 | bwd_allreduce_microstep: 53.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2134 [2024-07-31 19:43:47,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 19:43:47,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.27 | bwd_microstep: 5076.62 | bwd_inner_microstep: 4683.30 | bwd_allreduce_microstep: 393.24 | step_microstep: 183.09 [2024-07-31 19:43:47,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28776.97 | bwd: 41041.00 | bwd_inner: 39246.89 | bwd_allreduce: 1793.63 | step: 183.67 73%|███████▎ | 892/1230 [17:31:53<6:35:39, 70.23s/it] {'loss': 1.1456, 'learning_rate': 3.706455012013994e-06, 'epoch': 0.73} 73%|███████▎ | 892/1230 [17:31:53<6:35:39, 70.23s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 19:43:57,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3861.63 | bwd_microstep: 5368.98 | bwd_inner_microstep: 5349.84 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-31 19:44:06,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.54 | bwd_microstep: 5300.54 | bwd_inner_microstep: 5229.19 | bwd_allreduce_microstep: 71.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2299 [2024-07-31 19:44:14,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.91 | bwd_microstep: 5294.90 | bwd_inner_microstep: 4885.97 | bwd_allreduce_microstep: 408.86 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2241 [2024-07-31 19:44:23,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.46 | bwd_microstep: 5229.22 | bwd_inner_microstep: 4822.92 | bwd_allreduce_microstep: 406.23 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 19:44:32,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.00 | bwd_microstep: 5124.78 | bwd_inner_microstep: 5071.58 | bwd_allreduce_microstep: 53.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3782 [2024-07-31 19:44:41,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.82 | bwd_microstep: 5027.38 | bwd_inner_microstep: 5008.01 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 19:44:50,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3802.73 | bwd_microstep: 5100.11 | bwd_inner_microstep: 4703.38 | bwd_allreduce_microstep: 396.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 19:44:59,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-31 19:44:59,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.20 | bwd_microstep: 4901.18 | bwd_inner_microstep: 4879.77 | bwd_allreduce_microstep: 21.34 | step_microstep: 182.63 [2024-07-31 19:44:59,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29443.20 | bwd: 41347.08 | bwd_inner: 39950.61 | bwd_allreduce: 1395.98 | step: 183.22 73%|███████▎ | 893/1230 [17:33:04<6:35:58, 70.50s/it] {'loss': 1.1236, 'learning_rate': 3.6860125715600513e-06, 'epoch': 0.73} 73%|███████▎ | 893/1230 [17:33:04<6:35:58, 70.50s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2508 [2024-07-31 19:45:07,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.16 | bwd_microstep: 5294.96 | bwd_inner_microstep: 4887.29 | bwd_allreduce_microstep: 407.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2304 [2024-07-31 19:45:16,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.39 | bwd_microstep: 5357.93 | bwd_inner_microstep: 4939.78 | bwd_allreduce_microstep: 418.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 19:45:25,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.27 | bwd_microstep: 5143.20 | bwd_inner_microstep: 5066.08 | bwd_allreduce_microstep: 77.05 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3631 [2024-07-31 19:45:34,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3336.75 | bwd_microstep: 5046.68 | bwd_inner_microstep: 4966.69 | bwd_allreduce_microstep: 79.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 19:45:42,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3476.81 | bwd_microstep: 5070.46 | bwd_inner_microstep: 4677.29 | bwd_allreduce_microstep: 393.11 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 19:45:51,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.55 | bwd_microstep: 4886.67 | bwd_inner_microstep: 4865.71 | bwd_allreduce_microstep: 20.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-31 19:45:59,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.15 | bwd_microstep: 4947.73 | bwd_inner_microstep: 4900.87 | bwd_allreduce_microstep: 46.79 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2131 [2024-07-31 19:46:08,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 19:46:08,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.25 | bwd_microstep: 5231.57 | bwd_inner_microstep: 4825.61 | bwd_allreduce_microstep: 405.88 | step_microstep: 181.31 [2024-07-31 19:46:08,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28356.24 | bwd: 40979.16 | bwd_inner: 39129.27 | bwd_allreduce: 1849.39 | step: 181.98 73%|███████▎ | 894/1230 [17:34:14<6:33:23, 70.25s/it] {'loss': 1.1432, 'learning_rate': 3.665613915800217e-06, 'epoch': 0.73} 73%|███████▎ | 894/1230 [17:34:14<6:33:23, 70.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 19:46:17,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.84 | bwd_microstep: 5217.44 | bwd_inner_microstep: 5198.28 | bwd_allreduce_microstep: 19.09 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3829 [2024-07-31 19:46:26,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.48 | bwd_microstep: 5049.00 | bwd_inner_microstep: 5029.67 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2250 [2024-07-31 19:46:35,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.34 | bwd_microstep: 5246.66 | bwd_inner_microstep: 4839.87 | bwd_allreduce_microstep: 406.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3874 [2024-07-31 19:46:44,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3783.76 | bwd_microstep: 5121.39 | bwd_inner_microstep: 5102.12 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-31 19:46:52,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.98 | bwd_microstep: 5108.73 | bwd_inner_microstep: 5038.60 | bwd_allreduce_microstep: 70.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 19:47:01,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.60 | bwd_microstep: 5001.69 | bwd_inner_microstep: 4982.36 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-31 19:47:10,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.28 | bwd_microstep: 5016.88 | bwd_inner_microstep: 4997.53 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 19:47:19,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 19:47:19,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.31 | bwd_microstep: 4899.18 | bwd_inner_microstep: 4877.04 | bwd_allreduce_microstep: 22.08 | step_microstep: 182.01 [2024-07-31 19:47:19,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29600.48 | bwd: 40660.95 | bwd_inner: 40065.41 | bwd_allreduce: 595.05 | step: 182.60 73%|███████▎ | 895/1230 [17:35:25<6:32:48, 70.35s/it] {'loss': 1.1848, 'learning_rate': 3.6452591861900857e-06, 'epoch': 0.73} 73%|███████▎ | 895/1230 [17:35:25<6:32:48, 70.35s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4011 [2024-07-31 19:47:28,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.23 | bwd_microstep: 5164.91 | bwd_inner_microstep: 5144.09 | bwd_allreduce_microstep: 20.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3875 [2024-07-31 19:47:37,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.81 | bwd_microstep: 5135.44 | bwd_inner_microstep: 5116.04 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 19:47:45,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.75 | bwd_microstep: 5167.41 | bwd_inner_microstep: 5090.18 | bwd_allreduce_microstep: 77.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3836 [2024-07-31 19:47:54,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.85 | bwd_microstep: 5047.98 | bwd_inner_microstep: 5028.68 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3822 [2024-07-31 19:48:03,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.69 | bwd_microstep: 5056.50 | bwd_inner_microstep: 5037.22 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 19:48:12,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.31 | bwd_microstep: 4993.49 | bwd_inner_microstep: 4974.17 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3151 [2024-07-31 19:48:20,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.98 | bwd_microstep: 4994.38 | bwd_inner_microstep: 4816.42 | bwd_allreduce_microstep: 177.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 19:48:29,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 19:48:29,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.55 | bwd_microstep: 4892.39 | bwd_inner_microstep: 4872.96 | bwd_allreduce_microstep: 19.36 | step_microstep: 181.21 [2024-07-31 19:48:29,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29554.07 | bwd: 40452.47 | bwd_inner: 40079.71 | bwd_allreduce: 372.27 | step: 181.78 73%|███████▎ | 896/1230 [17:36:35<6:31:37, 70.35s/it] {'loss': 1.1296, 'learning_rate': 3.6249485238806637e-06, 'epoch': 0.73} 73%|███████▎ | 896/1230 [17:36:35<6:31:37, 70.35s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 19:48:38,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3845.08 | bwd_microstep: 5331.21 | bwd_inner_microstep: 5312.16 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3573 [2024-07-31 19:48:47,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.31 | bwd_microstep: 5294.45 | bwd_inner_microstep: 5208.64 | bwd_allreduce_microstep: 85.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3583 [2024-07-31 19:48:56,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.23 | bwd_microstep: 5139.28 | bwd_inner_microstep: 5055.09 | bwd_allreduce_microstep: 84.13 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3606 [2024-07-31 19:49:05,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.00 | bwd_microstep: 5134.34 | bwd_inner_microstep: 5043.61 | bwd_allreduce_microstep: 90.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-31 19:49:14,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.02 | bwd_microstep: 5021.07 | bwd_inner_microstep: 4993.80 | bwd_allreduce_microstep: 27.21 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2204 [2024-07-31 19:49:22,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.84 | bwd_microstep: 5217.80 | bwd_inner_microstep: 4810.79 | bwd_allreduce_microstep: 406.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 19:49:31,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.00 | bwd_microstep: 5147.19 | bwd_inner_microstep: 4747.60 | bwd_allreduce_microstep: 399.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-31 19:49:39,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-31 19:49:39,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3235.28 | bwd_microstep: 4794.86 | bwd_inner_microstep: 4775.50 | bwd_allreduce_microstep: 19.29 | step_microstep: 182.29 [2024-07-31 19:49:39,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28701.68 | bwd: 41080.19 | bwd_inner: 39947.13 | bwd_allreduce: 1132.56 | step: 182.87 73%|███████▎ | 897/1230 [17:37:45<6:30:03, 70.28s/it] {'loss': 1.1462, 'learning_rate': 3.6046820697173514e-06, 'epoch': 0.73} 73%|███████▎ | 897/1230 [17:37:45<6:30:03, 70.28s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4030 [2024-07-31 19:49:48,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3845.76 | bwd_microstep: 5288.47 | bwd_inner_microstep: 5269.35 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2278 [2024-07-31 19:49:57,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.93 | bwd_microstep: 5156.79 | bwd_inner_microstep: 4755.54 | bwd_allreduce_microstep: 401.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2226 [2024-07-31 19:50:06,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.03 | bwd_microstep: 5223.19 | bwd_inner_microstep: 4817.21 | bwd_allreduce_microstep: 405.91 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2224 [2024-07-31 19:50:15,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.65 | bwd_microstep: 5119.20 | bwd_inner_microstep: 4721.31 | bwd_allreduce_microstep: 397.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3712 [2024-07-31 19:50:23,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.52 | bwd_microstep: 5038.38 | bwd_inner_microstep: 4967.27 | bwd_allreduce_microstep: 71.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-31 19:50:31,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.43 | bwd_microstep: 4793.56 | bwd_inner_microstep: 4774.11 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-31 19:50:40,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.42 | bwd_microstep: 5203.06 | bwd_inner_microstep: 5145.95 | bwd_allreduce_microstep: 57.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 19:50:48,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 19:50:48,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3192.85 | bwd_microstep: 4685.53 | bwd_inner_microstep: 4666.20 | bwd_allreduce_microstep: 19.26 | step_microstep: 181.70 [2024-07-31 19:50:48,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28028.47 | bwd: 40508.17 | bwd_inner: 39116.88 | bwd_allreduce: 1390.79 | step: 182.40 73%|███████▎ | 898/1230 [17:38:54<6:26:32, 69.86s/it] {'loss': 1.147, 'learning_rate': 3.584459964239e-06, 'epoch': 0.73} 73%|███████▎ | 898/1230 [17:38:54<6:26:32, 69.86s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2376 [2024-07-31 19:50:57,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.35 | bwd_microstep: 5300.14 | bwd_inner_microstep: 4892.30 | bwd_allreduce_microstep: 407.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3874 [2024-07-31 19:51:06,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3851.48 | bwd_microstep: 5315.67 | bwd_inner_microstep: 5266.40 | bwd_allreduce_microstep: 49.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2250 [2024-07-31 19:51:15,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.56 | bwd_microstep: 5133.98 | bwd_inner_microstep: 4734.77 | bwd_allreduce_microstep: 399.15 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3780 [2024-07-31 19:51:24,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.77 | bwd_microstep: 5051.54 | bwd_inner_microstep: 5025.98 | bwd_allreduce_microstep: 25.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 19:51:32,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.88 | bwd_microstep: 5071.61 | bwd_inner_microstep: 5005.89 | bwd_allreduce_microstep: 65.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 19:51:41,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3444.57 | bwd_microstep: 5015.03 | bwd_inner_microstep: 4623.84 | bwd_allreduce_microstep: 391.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2219 [2024-07-31 19:51:49,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.09 | bwd_microstep: 5092.23 | bwd_inner_microstep: 4698.85 | bwd_allreduce_microstep: 393.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2126 [2024-07-31 19:51:58,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 19:51:58,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.95 | bwd_microstep: 5063.17 | bwd_inner_microstep: 4671.89 | bwd_allreduce_microstep: 391.21 | step_microstep: 181.84 [2024-07-31 19:51:58,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28712.56 | bwd: 41043.36 | bwd_inner: 38919.86 | bwd_allreduce: 2123.02 | step: 182.43 73%|███████▎ | 899/1230 [17:40:04<6:25:45, 69.92s/it] {'loss': 1.1399, 'learning_rate': 3.564282347676903e-06, 'epoch': 0.73} 73%|███████▎ | 899/1230 [17:40:04<6:25:45, 69.92s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4059 [2024-07-31 19:52:07,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3848.59 | bwd_microstep: 5323.53 | bwd_inner_microstep: 5304.35 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2242 [2024-07-31 19:52:16,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.74 | bwd_microstep: 5371.56 | bwd_inner_microstep: 4956.18 | bwd_allreduce_microstep: 415.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3802 [2024-07-31 19:52:25,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.50 | bwd_microstep: 5146.08 | bwd_inner_microstep: 5098.88 | bwd_allreduce_microstep: 47.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-31 19:52:34,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.57 | bwd_microstep: 5176.09 | bwd_inner_microstep: 5118.62 | bwd_allreduce_microstep: 57.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2219 [2024-07-31 19:52:43,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.88 | bwd_microstep: 5235.19 | bwd_inner_microstep: 4828.95 | bwd_allreduce_microstep: 406.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-31 19:52:52,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.30 | bwd_microstep: 5016.73 | bwd_inner_microstep: 4974.59 | bwd_allreduce_microstep: 42.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 19:53:00,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.76 | bwd_microstep: 4995.91 | bwd_inner_microstep: 4943.58 | bwd_allreduce_microstep: 52.26 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2147 [2024-07-31 19:53:09,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 19:53:09,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.81 | bwd_microstep: 5294.77 | bwd_inner_microstep: 4820.73 | bwd_allreduce_microstep: 473.98 | step_microstep: 181.49 [2024-07-31 19:53:09,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29055.06 | bwd: 41559.85 | bwd_inner: 40045.82 | bwd_allreduce: 1513.54 | step: 182.07 73%|███████▎ | 900/1230 [17:41:15<6:26:16, 70.23s/it] {'loss': 1.1472, 'learning_rate': 3.54414935995387e-06, 'epoch': 0.73} 73%|███████▎ | 900/1230 [17:41:15<6:26:16, 70.23s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 4096 [2024-07-31 19:53:17,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3247.56 | bwd_microstep: 5007.98 | bwd_inner_microstep: 4988.78 | bwd_allreduce_microstep: 19.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 19:53:26,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.28 | bwd_microstep: 5023.87 | bwd_inner_microstep: 4981.97 | bwd_allreduce_microstep: 41.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 19:53:34,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.56 | bwd_microstep: 4815.49 | bwd_inner_microstep: 4771.60 | bwd_allreduce_microstep: 43.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 19:53:43,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3802.38 | bwd_microstep: 5155.19 | bwd_inner_microstep: 5082.71 | bwd_allreduce_microstep: 72.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 19:53:52,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.01 | bwd_microstep: 5018.46 | bwd_inner_microstep: 4964.32 | bwd_allreduce_microstep: 54.08 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2177 [2024-07-31 19:54:01,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.08 | bwd_microstep: 5206.70 | bwd_inner_microstep: 4802.06 | bwd_allreduce_microstep: 404.57 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3819 [2024-07-31 19:54:09,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.27 | bwd_microstep: 4980.61 | bwd_inner_microstep: 4948.76 | bwd_allreduce_microstep: 31.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 19:54:18,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 19:54:18,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.69 | bwd_microstep: 4980.78 | bwd_inner_microstep: 4931.37 | bwd_allreduce_microstep: 49.35 | step_microstep: 182.26 [2024-07-31 19:54:18,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28221.74 | bwd: 40189.07 | bwd_inner: 39471.51 | bwd_allreduce: 717.09 | step: 182.83 73%|███████▎ | 901/1230 [17:42:24<6:22:38, 69.78s/it] {'loss': 1.1688, 'learning_rate': 3.524061140683206e-06, 'epoch': 0.73} 73%|███████▎ | 901/1230 [17:42:24<6:22:38, 69.78s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3854 [2024-07-31 19:54:27,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.41 | bwd_microstep: 5462.46 | bwd_inner_microstep: 5377.10 | bwd_allreduce_microstep: 85.29 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3828 [2024-07-31 19:54:36,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.28 | bwd_microstep: 5290.22 | bwd_inner_microstep: 5238.12 | bwd_allreduce_microstep: 52.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-31 19:54:45,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.36 | bwd_microstep: 5006.82 | bwd_inner_microstep: 4987.54 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2101 [2024-07-31 19:54:54,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.31 | bwd_microstep: 5218.41 | bwd_inner_microstep: 4810.44 | bwd_allreduce_microstep: 407.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-31 19:55:02,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.51 | bwd_microstep: 4894.44 | bwd_inner_microstep: 4875.11 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 19:55:11,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.34 | bwd_microstep: 5201.27 | bwd_inner_microstep: 4798.42 | bwd_allreduce_microstep: 402.78 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 19:55:20,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.78 | bwd_microstep: 5146.25 | bwd_inner_microstep: 4745.68 | bwd_allreduce_microstep: 400.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-31 19:55:29,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-31 19:55:29,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.96 | bwd_microstep: 5040.44 | bwd_inner_microstep: 4982.23 | bwd_allreduce_microstep: 58.14 | step_microstep: 421.57 [2024-07-31 19:55:29,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28953.85 | bwd: 41260.30 | bwd_inner: 39814.60 | bwd_allreduce: 1445.22 | step: 422.16 73%|███████▎ | 902/1230 [17:43:35<6:23:07, 70.08s/it] {'loss': 1.1588, 'learning_rate': 3.5040178291677786e-06, 'epoch': 0.73} 73%|███████▎ | 902/1230 [17:43:35<6:23:07, 70.08s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3947 [2024-07-31 19:55:38,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3873.18 | bwd_microstep: 5441.66 | bwd_inner_microstep: 5384.27 | bwd_allreduce_microstep: 57.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3844 [2024-07-31 19:55:47,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3785.57 | bwd_microstep: 5107.79 | bwd_inner_microstep: 5088.40 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 19:55:56,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.97 | bwd_microstep: 4999.60 | bwd_inner_microstep: 4980.17 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-31 19:56:04,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.71 | bwd_microstep: 5137.34 | bwd_inner_microstep: 5063.05 | bwd_allreduce_microstep: 74.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 19:56:13,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.83 | bwd_microstep: 5172.84 | bwd_inner_microstep: 5088.37 | bwd_allreduce_microstep: 84.40 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 19:56:22,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.52 | bwd_microstep: 5131.74 | bwd_inner_microstep: 5053.47 | bwd_allreduce_microstep: 78.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3687 [2024-07-31 19:56:31,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.23 | bwd_microstep: 5030.53 | bwd_inner_microstep: 4963.61 | bwd_allreduce_microstep: 66.85 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2131 [2024-07-31 19:56:39,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 19:56:39,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.17 | bwd_microstep: 5061.61 | bwd_inner_microstep: 4667.01 | bwd_allreduce_microstep: 394.52 | step_microstep: 182.10 [2024-07-31 19:56:39,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29330.10 | bwd: 41083.09 | bwd_inner: 40288.30 | bwd_allreduce: 794.30 | step: 182.71 73%|███████▎ | 903/1230 [17:44:45<6:23:02, 70.28s/it] {'loss': 1.129, 'learning_rate': 3.484019564399035e-06, 'epoch': 0.73} 73%|███████▎ | 903/1230 [17:44:45<6:23:02, 70.28s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3899 [2024-07-31 19:56:48,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3801.53 | bwd_microstep: 5147.85 | bwd_inner_microstep: 5126.03 | bwd_allreduce_microstep: 21.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3871 [2024-07-31 19:56:57,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3807.15 | bwd_microstep: 5127.64 | bwd_inner_microstep: 5083.44 | bwd_allreduce_microstep: 44.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2256 [2024-07-31 19:57:06,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.50 | bwd_microstep: 5179.63 | bwd_inner_microstep: 4776.94 | bwd_allreduce_microstep: 402.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 19:57:15,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.46 | bwd_microstep: 5035.32 | bwd_inner_microstep: 5006.70 | bwd_allreduce_microstep: 28.56 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 19:57:23,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.92 | bwd_microstep: 4801.40 | bwd_inner_microstep: 4763.94 | bwd_allreduce_microstep: 37.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-31 19:57:32,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.00 | bwd_microstep: 4995.71 | bwd_inner_microstep: 4976.34 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-31 19:57:40,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.23 | bwd_microstep: 4868.43 | bwd_inner_microstep: 4819.13 | bwd_allreduce_microstep: 49.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-31 19:57:48,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 19:57:48,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3219.56 | bwd_microstep: 4808.33 | bwd_inner_microstep: 4788.92 | bwd_allreduce_microstep: 19.33 | step_microstep: 181.25 [2024-07-31 19:57:48,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28780.26 | bwd: 39964.29 | bwd_inner: 39341.38 | bwd_allreduce: 622.43 | step: 181.84 73%|███████▎ | 904/1230 [17:45:54<6:19:54, 69.92s/it] {'loss': 1.1715, 'learning_rate': 3.4640664850560514e-06, 'epoch': 0.73} 73%|███████▎ | 904/1230 [17:45:54<6:19:54, 69.92s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2503 [2024-07-31 19:57:58,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.25 | bwd_microstep: 5427.76 | bwd_inner_microstep: 5011.89 | bwd_allreduce_microstep: 415.81 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-31 19:58:06,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.43 | bwd_microstep: 5018.40 | bwd_inner_microstep: 4999.11 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3966 [2024-07-31 19:58:15,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3313.76 | bwd_microstep: 4990.36 | bwd_inner_microstep: 4971.05 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-31 19:58:24,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.30 | bwd_microstep: 5060.85 | bwd_inner_microstep: 5035.26 | bwd_allreduce_microstep: 25.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-31 19:58:32,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.06 | bwd_microstep: 4993.59 | bwd_inner_microstep: 4971.76 | bwd_allreduce_microstep: 21.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-31 19:58:41,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3474.28 | bwd_microstep: 5057.93 | bwd_inner_microstep: 4666.48 | bwd_allreduce_microstep: 391.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 19:58:49,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.55 | bwd_microstep: 5019.26 | bwd_inner_microstep: 4969.23 | bwd_allreduce_microstep: 49.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 19:58:58,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 19:58:58,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.71 | bwd_microstep: 5056.51 | bwd_inner_microstep: 4663.98 | bwd_allreduce_microstep: 392.46 | step_microstep: 181.52 [2024-07-31 19:58:58,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28734.26 | bwd: 40624.64 | bwd_inner: 39288.70 | bwd_allreduce: 1335.45 | step: 182.10 74%|███████▎ | 905/1230 [17:47:04<6:18:21, 69.85s/it] {'loss': 1.1075, 'learning_rate': 3.444158729504549e-06, 'epoch': 0.74} 74%|███████▎ | 905/1230 [17:47:04<6:18:21, 69.85s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3934 [2024-07-31 19:59:07,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3878.12 | bwd_microstep: 5312.93 | bwd_inner_microstep: 5268.95 | bwd_allreduce_microstep: 43.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-31 19:59:16,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.66 | bwd_microstep: 5269.58 | bwd_inner_microstep: 5170.94 | bwd_allreduce_microstep: 98.57 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2244 [2024-07-31 19:59:24,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3005.65 | bwd_microstep: 4976.87 | bwd_inner_microstep: 4592.79 | bwd_allreduce_microstep: 384.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-07-31 19:59:33,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.67 | bwd_microstep: 5218.69 | bwd_inner_microstep: 5154.59 | bwd_allreduce_microstep: 64.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 19:59:42,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.87 | bwd_microstep: 5049.31 | bwd_inner_microstep: 5007.19 | bwd_allreduce_microstep: 42.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 19:59:51,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.89 | bwd_microstep: 5138.43 | bwd_inner_microstep: 5085.76 | bwd_allreduce_microstep: 52.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-31 19:59:59,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.69 | bwd_microstep: 4978.13 | bwd_inner_microstep: 4943.40 | bwd_allreduce_microstep: 34.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-31 20:00:08,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 20:00:08,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.55 | bwd_microstep: 5180.25 | bwd_inner_microstep: 4776.52 | bwd_allreduce_microstep: 403.66 | step_microstep: 181.43 [2024-07-31 20:00:08,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28766.01 | bwd: 41124.17 | bwd_inner: 40000.08 | bwd_allreduce: 1123.61 | step: 182.01 74%|███████▎ | 906/1230 [17:48:14<6:17:48, 69.96s/it] {'loss': 1.2042, 'learning_rate': 3.4242964357959597e-06, 'epoch': 0.74} 74%|███████▎ | 906/1230 [17:48:14<6:17:48, 69.96s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 20:00:18,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3899.27 | bwd_microstep: 5409.80 | bwd_inner_microstep: 5390.66 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3862 [2024-07-31 20:00:27,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3818.10 | bwd_microstep: 5154.81 | bwd_inner_microstep: 5129.56 | bwd_allreduce_microstep: 25.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3764 [2024-07-31 20:00:35,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.27 | bwd_microstep: 5121.09 | bwd_inner_microstep: 5075.74 | bwd_allreduce_microstep: 45.28 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-31 20:00:44,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.06 | bwd_microstep: 5229.76 | bwd_inner_microstep: 4824.21 | bwd_allreduce_microstep: 405.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 20:00:52,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.52 | bwd_microstep: 4782.93 | bwd_inner_microstep: 4748.64 | bwd_allreduce_microstep: 34.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 20:01:01,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.45 | bwd_microstep: 5160.02 | bwd_inner_microstep: 5081.48 | bwd_allreduce_microstep: 78.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 20:01:09,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3211.79 | bwd_microstep: 4770.44 | bwd_inner_microstep: 4743.07 | bwd_allreduce_microstep: 27.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 20:01:18,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 20:01:18,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.58 | bwd_microstep: 5132.63 | bwd_inner_microstep: 4733.16 | bwd_allreduce_microstep: 399.39 | step_microstep: 183.37 [2024-07-31 20:01:18,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28459.95 | bwd: 40761.46 | bwd_inner: 39726.47 | bwd_allreduce: 1034.51 | step: 183.96 74%|███████▎ | 907/1230 [17:49:24<6:15:58, 69.84s/it] {'loss': 1.1493, 'learning_rate': 3.4044797416664564e-06, 'epoch': 0.74} 74%|███████▎ | 907/1230 [17:49:24<6:15:58, 69.84s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2393 [2024-07-31 20:01:27,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.17 | bwd_microstep: 5238.72 | bwd_inner_microstep: 4833.27 | bwd_allreduce_microstep: 405.38 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3782 [2024-07-31 20:01:36,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.66 | bwd_microstep: 5008.71 | bwd_inner_microstep: 4989.36 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3070 [2024-07-31 20:01:44,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.53 | bwd_microstep: 5147.06 | bwd_inner_microstep: 4850.17 | bwd_allreduce_microstep: 296.82 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-31 20:01:53,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.80 | bwd_microstep: 5118.71 | bwd_inner_microstep: 5040.86 | bwd_allreduce_microstep: 77.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 20:02:02,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.40 | bwd_microstep: 5299.26 | bwd_inner_microstep: 5227.23 | bwd_allreduce_microstep: 71.97 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 20:02:11,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.69 | bwd_microstep: 5010.51 | bwd_inner_microstep: 4951.71 | bwd_allreduce_microstep: 58.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 20:02:19,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.83 | bwd_microstep: 5136.75 | bwd_inner_microstep: 5070.96 | bwd_allreduce_microstep: 65.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-31 20:02:27,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 20:02:27,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3023.58 | bwd_microstep: 4913.34 | bwd_inner_microstep: 4536.01 | bwd_allreduce_microstep: 377.26 | step_microstep: 181.90 [2024-07-31 20:02:27,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28265.56 | bwd: 40873.04 | bwd_inner: 39499.50 | bwd_allreduce: 1373.05 | step: 182.61 74%|███████▍ | 908/1230 [17:50:33<6:14:12, 69.73s/it] {'loss': 1.1522, 'learning_rate': 3.3847087845359996e-06, 'epoch': 0.74} 74%|███████▍ | 908/1230 [17:50:33<6:14:12, 69.73s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3900 [2024-07-31 20:02:37,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.17 | bwd_microstep: 5387.24 | bwd_inner_microstep: 5315.81 | bwd_allreduce_microstep: 71.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3942 [2024-07-31 20:02:46,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3815.49 | bwd_microstep: 5152.29 | bwd_inner_microstep: 5132.95 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3777 [2024-07-31 20:02:54,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.29 | bwd_microstep: 5069.79 | bwd_inner_microstep: 5045.34 | bwd_allreduce_microstep: 24.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2243 [2024-07-31 20:03:03,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.48 | bwd_microstep: 5233.25 | bwd_inner_microstep: 4825.89 | bwd_allreduce_microstep: 407.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-31 20:03:12,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.10 | bwd_microstep: 5159.08 | bwd_inner_microstep: 4757.38 | bwd_allreduce_microstep: 401.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 20:03:21,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.45 | bwd_microstep: 5190.95 | bwd_inner_microstep: 5109.83 | bwd_allreduce_microstep: 81.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 20:03:30,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.17 | bwd_microstep: 5162.36 | bwd_inner_microstep: 5089.97 | bwd_allreduce_microstep: 72.31 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 20:03:38,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 20:03:38,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.37 | bwd_microstep: 4850.36 | bwd_inner_microstep: 4805.14 | bwd_allreduce_microstep: 45.15 | step_microstep: 182.19 [2024-07-31 20:03:38,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28815.44 | bwd: 41205.30 | bwd_inner: 40082.24 | bwd_allreduce: 1122.54 | step: 182.79 74%|███████▍ | 909/1230 [17:51:44<6:14:03, 69.92s/it] {'loss': 1.1815, 'learning_rate': 3.364983701507376e-06, 'epoch': 0.74} 74%|███████▍ | 909/1230 [17:51:44<6:14:03, 69.92s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 20:03:47,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3877.33 | bwd_microstep: 5400.66 | bwd_inner_microstep: 5381.61 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2337 [2024-07-31 20:03:55,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.52 | bwd_microstep: 4979.66 | bwd_inner_microstep: 4596.54 | bwd_allreduce_microstep: 383.05 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2250 [2024-07-31 20:04:04,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.47 | bwd_microstep: 5211.34 | bwd_inner_microstep: 4806.32 | bwd_allreduce_microstep: 404.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-31 20:04:13,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.30 | bwd_microstep: 5006.72 | bwd_inner_microstep: 4985.09 | bwd_allreduce_microstep: 21.56 | step_microstep: 0.18 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2220 [2024-07-31 20:04:22,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.32 | bwd_microstep: 5356.54 | bwd_inner_microstep: 4940.79 | bwd_allreduce_microstep: 415.68 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2116 [2024-07-31 20:04:30,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.80 | bwd_microstep: 5192.85 | bwd_inner_microstep: 4788.92 | bwd_allreduce_microstep: 403.86 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3146 [2024-07-31 20:04:39,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.51 | bwd_microstep: 5159.56 | bwd_inner_microstep: 4884.72 | bwd_allreduce_microstep: 274.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-31 20:04:48,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 20:04:48,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.34 | bwd_microstep: 4870.17 | bwd_inner_microstep: 4850.74 | bwd_allreduce_microstep: 19.36 | step_microstep: 182.16 [2024-07-31 20:04:48,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28622.48 | bwd: 41177.47 | bwd_inner: 39234.65 | bwd_allreduce: 1942.32 | step: 182.86 74%|███████▍ | 910/1230 [17:52:54<6:13:14, 69.98s/it] {'loss': 1.1356, 'learning_rate': 3.3453046293652657e-06, 'epoch': 0.74} 74%|███████▍ | 910/1230 [17:52:54<6:13:14, 69.98s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4026 [2024-07-31 20:04:57,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3852.32 | bwd_microstep: 5276.30 | bwd_inner_microstep: 5257.09 | bwd_allreduce_microstep: 19.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-31 20:05:06,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.44 | bwd_microstep: 5183.37 | bwd_inner_microstep: 5124.20 | bwd_allreduce_microstep: 59.11 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3782 [2024-07-31 20:05:15,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.99 | bwd_microstep: 5090.71 | bwd_inner_microstep: 5071.30 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-31 20:05:24,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.91 | bwd_microstep: 5114.51 | bwd_inner_microstep: 5043.30 | bwd_allreduce_microstep: 71.14 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3619 [2024-07-31 20:05:32,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.59 | bwd_microstep: 5197.73 | bwd_inner_microstep: 5105.85 | bwd_allreduce_microstep: 91.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 20:05:41,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.38 | bwd_microstep: 5063.62 | bwd_inner_microstep: 5004.74 | bwd_allreduce_microstep: 58.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 20:05:49,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3204.90 | bwd_microstep: 4709.13 | bwd_inner_microstep: 4687.10 | bwd_allreduce_microstep: 21.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3691 [2024-07-31 20:05:57,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 20:05:57,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3088.76 | bwd_microstep: 4860.44 | bwd_inner_microstep: 4812.61 | bwd_allreduce_microstep: 47.76 | step_microstep: 182.38 [2024-07-31 20:05:57,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28339.18 | bwd: 40495.79 | bwd_inner: 40106.12 | bwd_allreduce: 389.19 | step: 182.97 74%|███████▍ | 911/1230 [17:54:03<6:10:46, 69.74s/it] {'loss': 1.1131, 'learning_rate': 3.3256717045752794e-06, 'epoch': 0.74} 74%|███████▍ | 911/1230 [17:54:03<6:10:46, 69.74s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3832 [2024-07-31 20:06:06,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3825.24 | bwd_microstep: 5305.12 | bwd_inner_microstep: 5261.49 | bwd_allreduce_microstep: 43.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-31 20:06:15,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.39 | bwd_microstep: 5124.43 | bwd_inner_microstep: 5054.06 | bwd_allreduce_microstep: 70.31 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2068 [2024-07-31 20:06:24,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.94 | bwd_microstep: 5268.82 | bwd_inner_microstep: 4859.98 | bwd_allreduce_microstep: 408.77 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-31 20:06:32,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.95 | bwd_microstep: 5054.14 | bwd_inner_microstep: 4661.18 | bwd_allreduce_microstep: 392.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 20:06:41,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.26 | bwd_microstep: 4992.15 | bwd_inner_microstep: 4972.66 | bwd_allreduce_microstep: 19.42 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 20:06:50,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.86 | bwd_microstep: 5151.85 | bwd_inner_microstep: 5079.30 | bwd_allreduce_microstep: 72.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-31 20:06:58,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.96 | bwd_microstep: 4985.69 | bwd_inner_microstep: 4966.27 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 20:07:07,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 20:07:07,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.41 | bwd_microstep: 4993.48 | bwd_inner_microstep: 4941.68 | bwd_allreduce_microstep: 51.72 | step_microstep: 182.35 [2024-07-31 20:07:07,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28817.91 | bwd: 40875.65 | bwd_inner: 39796.55 | bwd_allreduce: 1078.59 | step: 183.06 74%|███████▍ | 912/1230 [17:55:13<6:10:04, 69.83s/it] {'loss': 1.1695, 'learning_rate': 3.30608506328302e-06, 'epoch': 0.74} 74%|███████▍ | 912/1230 [17:55:13<6:10:04, 69.83s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3938 [2024-07-31 20:07:16,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3821.96 | bwd_microstep: 5206.69 | bwd_inner_microstep: 5187.51 | bwd_allreduce_microstep: 19.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3850 [2024-07-31 20:07:25,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3822.85 | bwd_microstep: 5169.40 | bwd_inner_microstep: 5139.14 | bwd_allreduce_microstep: 30.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-31 20:07:34,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.04 | bwd_microstep: 5059.40 | bwd_inner_microstep: 5033.20 | bwd_allreduce_microstep: 26.13 | step_microstep: 0.08 dynamic ViT batch size: 7, images per sample: 3.5, dynamic token length: 1339 [2024-07-31 20:07:42,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.75 | bwd_microstep: 5062.79 | bwd_inner_microstep: 4675.22 | bwd_allreduce_microstep: 387.51 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2096 [2024-07-31 20:07:51,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.79 | bwd_microstep: 5229.46 | bwd_inner_microstep: 4825.40 | bwd_allreduce_microstep: 403.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 20:08:00,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.65 | bwd_microstep: 5056.79 | bwd_inner_microstep: 4994.61 | bwd_allreduce_microstep: 62.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 20:08:08,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.80 | bwd_microstep: 5104.05 | bwd_inner_microstep: 5034.49 | bwd_allreduce_microstep: 69.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 20:08:17,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 20:08:17,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.21 | bwd_microstep: 5045.10 | bwd_inner_microstep: 4984.87 | bwd_allreduce_microstep: 60.15 | step_microstep: 181.28 [2024-07-31 20:08:17,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28740.94 | bwd: 40933.67 | bwd_inner: 39874.38 | bwd_allreduce: 1058.81 | step: 181.86 74%|███████▍ | 913/1230 [17:56:23<6:09:12, 69.88s/it] {'loss': 1.1028, 'learning_rate': 3.286544841313123e-06, 'epoch': 0.74} 74%|███████▍ | 913/1230 [17:56:23<6:09:12, 69.88s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-31 20:08:26,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.36 | bwd_microstep: 5499.05 | bwd_inner_microstep: 5406.86 | bwd_allreduce_microstep: 92.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2266 [2024-07-31 20:08:35,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3075.54 | bwd_microstep: 5080.56 | bwd_inner_microstep: 4688.69 | bwd_allreduce_microstep: 391.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-31 20:08:43,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.99 | bwd_microstep: 5036.42 | bwd_inner_microstep: 5017.05 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3799 [2024-07-31 20:08:52,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3671.13 | bwd_microstep: 5341.28 | bwd_inner_microstep: 5267.42 | bwd_allreduce_microstep: 73.79 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 20:09:01,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.33 | bwd_microstep: 5002.58 | bwd_inner_microstep: 4978.41 | bwd_allreduce_microstep: 24.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 20:09:10,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.42 | bwd_microstep: 5016.06 | bwd_inner_microstep: 4964.60 | bwd_allreduce_microstep: 51.39 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 20:09:18,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.37 | bwd_microstep: 5105.54 | bwd_inner_microstep: 5039.10 | bwd_allreduce_microstep: 66.36 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3672 [2024-07-31 20:09:27,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 20:09:27,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.78 | bwd_microstep: 5067.62 | bwd_inner_microstep: 4994.36 | bwd_allreduce_microstep: 73.20 | step_microstep: 181.86 [2024-07-31 20:09:27,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28700.82 | bwd: 41149.09 | bwd_inner: 40356.42 | bwd_allreduce: 792.17 | step: 182.46 74%|███████▍ | 914/1230 [17:57:33<6:08:30, 69.97s/it] {'loss': 1.1548, 'learning_rate': 3.2670511741683475e-06, 'epoch': 0.74} 74%|███████▍ | 914/1230 [17:57:33<6:08:30, 69.97s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3482 [2024-07-31 20:09:36,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.45 | bwd_microstep: 5412.43 | bwd_inner_microstep: 5234.16 | bwd_allreduce_microstep: 178.20 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 20:09:45,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.76 | bwd_microstep: 5209.91 | bwd_inner_microstep: 5125.38 | bwd_allreduce_microstep: 84.46 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2053 [2024-07-31 20:09:53,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3023.98 | bwd_microstep: 4974.06 | bwd_inner_microstep: 4591.92 | bwd_allreduce_microstep: 382.06 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-07-31 20:10:02,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.13 | bwd_microstep: 5227.62 | bwd_inner_microstep: 4820.08 | bwd_allreduce_microstep: 407.48 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-31 20:10:11,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.67 | bwd_microstep: 5058.46 | bwd_inner_microstep: 4997.75 | bwd_allreduce_microstep: 60.65 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-31 20:10:19,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3033.96 | bwd_microstep: 4906.66 | bwd_inner_microstep: 4529.99 | bwd_allreduce_microstep: 376.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-31 20:10:27,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.27 | bwd_microstep: 4924.97 | bwd_inner_microstep: 4895.16 | bwd_allreduce_microstep: 29.74 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3669 [2024-07-31 20:10:36,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 20:10:36,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.40 | bwd_microstep: 5166.14 | bwd_inner_microstep: 5070.84 | bwd_allreduce_microstep: 95.23 | step_microstep: 182.02 [2024-07-31 20:10:36,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27842.53 | bwd: 40880.22 | bwd_inner: 39265.21 | bwd_allreduce: 1614.53 | step: 182.72 74%|███████▍ | 915/1230 [17:58:42<6:05:54, 69.70s/it] {'loss': 1.1038, 'learning_rate': 3.2476041970285945e-06, 'epoch': 0.74} 74%|███████▍ | 915/1230 [17:58:42<6:05:54, 69.70s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3552 [2024-07-31 20:10:45,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.51 | bwd_microstep: 5116.50 | bwd_inner_microstep: 5038.19 | bwd_allreduce_microstep: 78.25 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2053 [2024-07-31 20:10:54,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.51 | bwd_microstep: 5242.70 | bwd_inner_microstep: 4835.80 | bwd_allreduce_microstep: 406.83 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3787 [2024-07-31 20:11:03,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.91 | bwd_microstep: 5339.59 | bwd_inner_microstep: 5252.26 | bwd_allreduce_microstep: 87.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-31 20:11:11,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.54 | bwd_microstep: 5073.07 | bwd_inner_microstep: 5010.03 | bwd_allreduce_microstep: 62.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 20:11:20,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.45 | bwd_microstep: 5167.15 | bwd_inner_microstep: 4765.19 | bwd_allreduce_microstep: 401.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 20:11:29,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.85 | bwd_microstep: 4986.79 | bwd_inner_microstep: 4967.46 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 20:11:37,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.79 | bwd_microstep: 4856.20 | bwd_inner_microstep: 4816.13 | bwd_allreduce_microstep: 40.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 20:11:46,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 20:11:46,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.17 | bwd_microstep: 5147.26 | bwd_inner_microstep: 5072.80 | bwd_allreduce_microstep: 74.39 | step_microstep: 181.46 [2024-07-31 20:11:46,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28099.64 | bwd: 40929.23 | bwd_inner: 39757.79 | bwd_allreduce: 1170.97 | step: 182.03 74%|███████▍ | 916/1230 [17:59:52<6:04:13, 69.60s/it] {'loss': 1.1464, 'learning_rate': 3.2282040447500063e-06, 'epoch': 0.74} 74%|███████▍ | 916/1230 [17:59:52<6:04:13, 69.60s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2255 [2024-07-31 20:11:54,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3102.57 | bwd_microstep: 5140.99 | bwd_inner_microstep: 4749.24 | bwd_allreduce_microstep: 391.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3858 [2024-07-31 20:12:03,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.03 | bwd_microstep: 5173.70 | bwd_inner_microstep: 5121.38 | bwd_allreduce_microstep: 52.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3779 [2024-07-31 20:12:12,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3817.53 | bwd_microstep: 5316.25 | bwd_inner_microstep: 5261.96 | bwd_allreduce_microstep: 54.22 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 20:12:21,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.41 | bwd_microstep: 5029.08 | bwd_inner_microstep: 5002.23 | bwd_allreduce_microstep: 26.78 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3617 [2024-07-31 20:12:30,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.03 | bwd_microstep: 5191.47 | bwd_inner_microstep: 5088.43 | bwd_allreduce_microstep: 102.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-07-31 20:12:38,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.78 | bwd_microstep: 5216.19 | bwd_inner_microstep: 4808.76 | bwd_allreduce_microstep: 407.36 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 20:12:47,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.28 | bwd_microstep: 5120.50 | bwd_inner_microstep: 5051.24 | bwd_allreduce_microstep: 69.19 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 20:12:56,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 20:12:56,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.01 | bwd_microstep: 4988.36 | bwd_inner_microstep: 4939.64 | bwd_allreduce_microstep: 48.66 | step_microstep: 182.78 [2024-07-31 20:12:56,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28644.54 | bwd: 41176.52 | bwd_inner: 40022.83 | bwd_allreduce: 1153.20 | step: 183.40 75%|███████▍ | 917/1230 [18:01:02<6:03:56, 69.76s/it] {'loss': 1.1932, 'learning_rate': 3.208850851863998e-06, 'epoch': 0.75} 75%|███████▍ | 917/1230 [18:01:02<6:03:56, 69.76s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2281 [2024-07-31 20:13:04,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3111.13 | bwd_microstep: 5213.47 | bwd_inner_microstep: 4812.56 | bwd_allreduce_microstep: 400.84 | step_microstep: 0.20 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-07-31 20:13:12,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.62 | bwd_microstep: 4923.03 | bwd_inner_microstep: 4861.48 | bwd_allreduce_microstep: 61.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3794 [2024-07-31 20:13:21,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.03 | bwd_microstep: 5036.37 | bwd_inner_microstep: 5016.10 | bwd_allreduce_microstep: 20.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-31 20:13:29,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3218.01 | bwd_microstep: 4839.57 | bwd_inner_microstep: 4793.69 | bwd_allreduce_microstep: 45.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-07-31 20:13:38,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.33 | bwd_microstep: 5146.33 | bwd_inner_microstep: 5100.47 | bwd_allreduce_microstep: 45.80 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 20:13:47,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.48 | bwd_microstep: 5024.96 | bwd_inner_microstep: 5001.89 | bwd_allreduce_microstep: 23.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-31 20:13:55,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.04 | bwd_microstep: 4924.04 | bwd_inner_microstep: 4884.51 | bwd_allreduce_microstep: 39.47 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1127 [2024-07-31 20:14:04,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 20:14:04,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.09 | bwd_microstep: 5224.57 | bwd_inner_microstep: 4821.78 | bwd_allreduce_microstep: 402.72 | step_microstep: 181.55 [2024-07-31 20:14:04,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27678.64 | bwd: 40332.32 | bwd_inner: 39292.40 | bwd_allreduce: 1039.40 | step: 182.26 75%|███████▍ | 918/1230 [18:02:10<6:00:33, 69.34s/it] {'loss': 1.1167, 'learning_rate': 3.189544752576369e-06, 'epoch': 0.75} 75%|███████▍ | 918/1230 [18:02:10<6:00:33, 69.34s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3412 [2024-07-31 20:14:13,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.74 | bwd_microstep: 5166.82 | bwd_inner_microstep: 5028.30 | bwd_allreduce_microstep: 138.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2218 [2024-07-31 20:14:22,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.95 | bwd_microstep: 5240.56 | bwd_inner_microstep: 4833.84 | bwd_allreduce_microstep: 406.65 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3600 [2024-07-31 20:14:31,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.52 | bwd_microstep: 5132.48 | bwd_inner_microstep: 5036.89 | bwd_allreduce_microstep: 95.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 20:14:39,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.77 | bwd_microstep: 4986.50 | bwd_inner_microstep: 4967.11 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 20:14:47,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.86 | bwd_microstep: 4787.37 | bwd_inner_microstep: 4745.88 | bwd_allreduce_microstep: 41.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3778 [2024-07-31 20:14:56,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.01 | bwd_microstep: 5035.29 | bwd_inner_microstep: 5015.90 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 20:15:05,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.26 | bwd_microstep: 4888.79 | bwd_inner_microstep: 4869.52 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3707 [2024-07-31 20:15:14,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 20:15:14,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.42 | bwd_microstep: 5068.12 | bwd_inner_microstep: 4997.48 | bwd_allreduce_microstep: 70.57 | step_microstep: 181.32 [2024-07-31 20:15:14,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28709.44 | bwd: 40305.91 | bwd_inner: 39494.87 | bwd_allreduce: 810.56 | step: 181.90 75%|███████▍ | 919/1230 [18:03:19<5:59:24, 69.34s/it] {'loss': 1.2171, 'learning_rate': 3.1702858807663175e-06, 'epoch': 0.75} 75%|███████▍ | 919/1230 [18:03:19<5:59:24, 69.34s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4018 [2024-07-31 20:15:23,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3850.85 | bwd_microstep: 5262.72 | bwd_inner_microstep: 5243.53 | bwd_allreduce_microstep: 19.12 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2201 [2024-07-31 20:15:31,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.32 | bwd_microstep: 5236.41 | bwd_inner_microstep: 4831.37 | bwd_allreduce_microstep: 404.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-31 20:15:40,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.10 | bwd_microstep: 4929.22 | bwd_inner_microstep: 4870.02 | bwd_allreduce_microstep: 59.14 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 20:15:48,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.44 | bwd_microstep: 5148.57 | bwd_inner_microstep: 5073.88 | bwd_allreduce_microstep: 74.62 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3633 [2024-07-31 20:15:57,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.98 | bwd_microstep: 5201.55 | bwd_inner_microstep: 5098.77 | bwd_allreduce_microstep: 102.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 20:16:06,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.14 | bwd_microstep: 4996.20 | bwd_inner_microstep: 4938.49 | bwd_allreduce_microstep: 57.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 20:16:14,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.50 | bwd_microstep: 5035.31 | bwd_inner_microstep: 4973.02 | bwd_allreduce_microstep: 62.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 20:16:23,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 20:16:23,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.81 | bwd_microstep: 5163.45 | bwd_inner_microstep: 5085.58 | bwd_allreduce_microstep: 77.80 | step_microstep: 181.82 [2024-07-31 20:16:23,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28565.06 | bwd: 40973.42 | bwd_inner: 40114.60 | bwd_allreduce: 858.34 | step: 182.42 75%|███████▍ | 920/1230 [18:04:29<5:59:04, 69.50s/it] {'loss': 1.0843, 'learning_rate': 3.1510743699855596e-06, 'epoch': 0.75} 75%|███████▍ | 920/1230 [18:04:29<5:59:04, 69.50s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3532 [2024-07-31 20:16:33,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.46 | bwd_microstep: 5499.94 | bwd_inner_microstep: 5337.75 | bwd_allreduce_microstep: 162.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2274 [2024-07-31 20:16:40,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2990.21 | bwd_microstep: 4806.86 | bwd_inner_microstep: 4436.25 | bwd_allreduce_microstep: 370.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3753 [2024-07-31 20:16:49,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.63 | bwd_microstep: 4983.65 | bwd_inner_microstep: 4964.17 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-31 20:16:58,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.49 | bwd_microstep: 5252.99 | bwd_inner_microstep: 4846.75 | bwd_allreduce_microstep: 406.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-31 20:17:07,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.81 | bwd_microstep: 5091.06 | bwd_inner_microstep: 5045.39 | bwd_allreduce_microstep: 45.60 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3713 [2024-07-31 20:17:16,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.93 | bwd_microstep: 5188.85 | bwd_inner_microstep: 5121.55 | bwd_allreduce_microstep: 67.24 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2136 [2024-07-31 20:17:23,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3012.11 | bwd_microstep: 4880.96 | bwd_inner_microstep: 4506.62 | bwd_allreduce_microstep: 374.27 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3659 [2024-07-31 20:17:32,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 20:17:32,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.50 | bwd_microstep: 4924.41 | bwd_inner_microstep: 4896.56 | bwd_allreduce_microstep: 27.77 | step_microstep: 181.22 [2024-07-31 20:17:32,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27857.05 | bwd: 40628.70 | bwd_inner: 39154.98 | bwd_allreduce: 1473.22 | step: 181.93 75%|███████▍ | 921/1230 [18:05:38<5:56:52, 69.29s/it] {'loss': 1.1415, 'learning_rate': 3.131910353457366e-06, 'epoch': 0.75} 75%|███████▍ | 921/1230 [18:05:38<5:56:52, 69.29s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 20:17:41,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.68 | bwd_microstep: 5436.06 | bwd_inner_microstep: 5332.22 | bwd_allreduce_microstep: 103.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2290 [2024-07-31 20:17:50,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.91 | bwd_microstep: 5483.40 | bwd_inner_microstep: 5058.22 | bwd_allreduce_microstep: 425.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-31 20:17:59,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3792.84 | bwd_microstep: 5093.56 | bwd_inner_microstep: 5063.91 | bwd_allreduce_microstep: 29.58 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3881 [2024-07-31 20:18:08,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.51 | bwd_microstep: 5125.31 | bwd_inner_microstep: 5106.05 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 20:18:17,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.64 | bwd_microstep: 5004.64 | bwd_inner_microstep: 4985.28 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 20:18:26,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.11 | bwd_microstep: 5169.13 | bwd_inner_microstep: 4769.26 | bwd_allreduce_microstep: 399.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 20:18:34,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.56 | bwd_microstep: 5080.89 | bwd_inner_microstep: 5020.57 | bwd_allreduce_microstep: 60.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 20:18:43,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 20:18:43,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.70 | bwd_microstep: 5064.88 | bwd_inner_microstep: 5005.09 | bwd_allreduce_microstep: 59.72 | step_microstep: 181.88 [2024-07-31 20:18:43,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29259.86 | bwd: 41457.86 | bwd_inner: 40340.55 | bwd_allreduce: 1116.82 | step: 182.47 75%|███████▍ | 922/1230 [18:06:49<5:58:25, 69.82s/it] {'loss': 1.1127, 'learning_rate': 3.112793964075681e-06, 'epoch': 0.75} 75%|███████▍ | 922/1230 [18:06:49<5:58:25, 69.82s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3860 [2024-07-31 20:18:52,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.98 | bwd_microstep: 5437.89 | bwd_inner_microstep: 5353.61 | bwd_allreduce_microstep: 84.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3826 [2024-07-31 20:19:01,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.09 | bwd_microstep: 5248.98 | bwd_inner_microstep: 5194.01 | bwd_allreduce_microstep: 54.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 20:19:10,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.60 | bwd_microstep: 5163.04 | bwd_inner_microstep: 5109.01 | bwd_allreduce_microstep: 53.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 20:19:19,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.39 | bwd_microstep: 5177.18 | bwd_inner_microstep: 5095.61 | bwd_allreduce_microstep: 81.50 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3710 [2024-07-31 20:19:28,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.31 | bwd_microstep: 5049.12 | bwd_inner_microstep: 4971.64 | bwd_allreduce_microstep: 77.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 20:19:36,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.48 | bwd_microstep: 4935.96 | bwd_inner_microstep: 4890.65 | bwd_allreduce_microstep: 45.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3680 [2024-07-31 20:19:45,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.70 | bwd_microstep: 4924.86 | bwd_inner_microstep: 4899.80 | bwd_allreduce_microstep: 24.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-31 20:19:54,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 20:19:54,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.52 | bwd_microstep: 5042.77 | bwd_inner_microstep: 4985.00 | bwd_allreduce_microstep: 57.70 | step_microstep: 181.52 [2024-07-31 20:19:54,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28938.99 | bwd: 40979.78 | bwd_inner: 40499.28 | bwd_allreduce: 480.02 | step: 182.10 75%|███████▌ | 923/1230 [18:07:59<5:57:55, 69.95s/it] {'loss': 1.1563, 'learning_rate': 3.0937253344041507e-06, 'epoch': 0.75} 75%|███████▌ | 923/1230 [18:07:59<5:57:55, 69.95s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3978 [2024-07-31 20:20:03,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3859.77 | bwd_microstep: 5264.99 | bwd_inner_microstep: 5245.16 | bwd_allreduce_microstep: 19.75 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3841 [2024-07-31 20:20:12,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.09 | bwd_microstep: 5241.96 | bwd_inner_microstep: 5169.91 | bwd_allreduce_microstep: 71.98 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2226 [2024-07-31 20:20:20,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.51 | bwd_microstep: 5172.01 | bwd_inner_microstep: 4771.36 | bwd_allreduce_microstep: 400.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 20:20:29,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.01 | bwd_microstep: 5179.20 | bwd_inner_microstep: 5119.60 | bwd_allreduce_microstep: 59.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-31 20:20:38,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.43 | bwd_microstep: 5028.16 | bwd_inner_microstep: 4971.89 | bwd_allreduce_microstep: 56.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 20:20:46,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.24 | bwd_microstep: 4954.82 | bwd_inner_microstep: 4910.62 | bwd_allreduce_microstep: 44.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 20:20:55,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.41 | bwd_microstep: 5029.09 | bwd_inner_microstep: 4971.87 | bwd_allreduce_microstep: 57.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 20:21:04,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 20:21:04,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.71 | bwd_microstep: 5038.84 | bwd_inner_microstep: 4982.81 | bwd_allreduce_microstep: 55.96 | step_microstep: 181.09 [2024-07-31 20:21:04,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28860.07 | bwd: 40909.04 | bwd_inner: 40143.17 | bwd_allreduce: 765.39 | step: 181.66 75%|███████▌ | 924/1230 [18:09:10<5:56:58, 70.00s/it] {'loss': 1.1567, 'learning_rate': 3.074704596675242e-06, 'epoch': 0.75} 75%|███████▌ | 924/1230 [18:09:10<5:56:58, 70.00s/it]dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3528 [2024-07-31 20:21:13,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.19 | bwd_microstep: 5224.78 | bwd_inner_microstep: 5145.82 | bwd_allreduce_microstep: 78.90 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2050 [2024-07-31 20:21:21,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.33 | bwd_microstep: 5262.94 | bwd_inner_microstep: 4854.89 | bwd_allreduce_microstep: 407.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3786 [2024-07-31 20:21:30,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.05 | bwd_microstep: 5136.91 | bwd_inner_microstep: 5089.96 | bwd_allreduce_microstep: 46.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-31 20:21:39,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.59 | bwd_microstep: 5084.39 | bwd_inner_microstep: 5039.84 | bwd_allreduce_microstep: 44.49 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 20:21:48,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.32 | bwd_microstep: 5113.80 | bwd_inner_microstep: 5066.19 | bwd_allreduce_microstep: 47.54 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-31 20:21:56,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.49 | bwd_microstep: 4998.98 | bwd_inner_microstep: 4979.64 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2153 [2024-07-31 20:22:05,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.36 | bwd_microstep: 5110.60 | bwd_inner_microstep: 4713.37 | bwd_allreduce_microstep: 397.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 20:22:14,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 20:22:14,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.26 | bwd_microstep: 5093.92 | bwd_inner_microstep: 4698.60 | bwd_allreduce_microstep: 395.25 | step_microstep: 181.49 [2024-07-31 20:22:14,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28689.49 | bwd: 41026.31 | bwd_inner: 39588.25 | bwd_allreduce: 1437.57 | step: 182.10 75%|███████▌ | 925/1230 [18:10:20<5:55:53, 70.01s/it] {'loss': 1.1468, 'learning_rate': 3.055731882789311e-06, 'epoch': 0.75} 75%|███████▌ | 925/1230 [18:10:20<5:55:53, 70.01s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3951 [2024-07-31 20:22:23,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.93 | bwd_microstep: 5155.12 | bwd_inner_microstep: 5136.10 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3569 [2024-07-31 20:22:32,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.16 | bwd_microstep: 5325.94 | bwd_inner_microstep: 5192.59 | bwd_allreduce_microstep: 133.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-31 20:22:40,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.85 | bwd_microstep: 5179.15 | bwd_inner_microstep: 5125.61 | bwd_allreduce_microstep: 53.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-07-31 20:22:48,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3038.45 | bwd_microstep: 5011.14 | bwd_inner_microstep: 4624.95 | bwd_allreduce_microstep: 386.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-31 20:22:57,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.25 | bwd_microstep: 5036.23 | bwd_inner_microstep: 5011.08 | bwd_allreduce_microstep: 25.08 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3634 [2024-07-31 20:23:06,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.74 | bwd_microstep: 5083.58 | bwd_inner_microstep: 5002.42 | bwd_allreduce_microstep: 81.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 20:23:15,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.18 | bwd_microstep: 4880.84 | bwd_inner_microstep: 4861.47 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-31 20:23:23,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 20:23:23,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.09 | bwd_microstep: 5117.62 | bwd_inner_microstep: 4721.59 | bwd_allreduce_microstep: 395.96 | step_microstep: 181.52 [2024-07-31 20:23:23,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28571.56 | bwd: 40789.62 | bwd_inner: 39675.76 | bwd_allreduce: 1113.37 | step: 182.21 75%|███████▌ | 926/1230 [18:11:29<5:54:14, 69.92s/it] {'loss': 1.1656, 'learning_rate': 3.036807324313691e-06, 'epoch': 0.75} 75%|███████▌ | 926/1230 [18:11:29<5:54:14, 69.92s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4077 [2024-07-31 20:23:33,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3883.28 | bwd_microstep: 5355.42 | bwd_inner_microstep: 5331.67 | bwd_allreduce_microstep: 23.68 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3664 [2024-07-31 20:23:41,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.03 | bwd_microstep: 4872.59 | bwd_inner_microstep: 4850.33 | bwd_allreduce_microstep: 22.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2321 [2024-07-31 20:23:50,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.38 | bwd_microstep: 5206.87 | bwd_inner_microstep: 4803.85 | bwd_allreduce_microstep: 402.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 20:23:59,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.29 | bwd_microstep: 5163.30 | bwd_inner_microstep: 5082.81 | bwd_allreduce_microstep: 80.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 20:24:07,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.04 | bwd_microstep: 5067.53 | bwd_inner_microstep: 5003.47 | bwd_allreduce_microstep: 63.99 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2187 [2024-07-31 20:24:16,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.79 | bwd_microstep: 5208.93 | bwd_inner_microstep: 4804.76 | bwd_allreduce_microstep: 404.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3606 [2024-07-31 20:24:25,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.47 | bwd_microstep: 5064.45 | bwd_inner_microstep: 4999.27 | bwd_allreduce_microstep: 65.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 20:24:34,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 20:24:34,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.99 | bwd_microstep: 5050.77 | bwd_inner_microstep: 4991.83 | bwd_allreduce_microstep: 58.87 | step_microstep: 181.44 [2024-07-31 20:24:34,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28956.16 | bwd: 40989.84 | bwd_inner: 39867.93 | bwd_allreduce: 1121.43 | step: 182.03 75%|███████▌ | 927/1230 [18:12:40<5:53:37, 70.02s/it] {'loss': 1.1455, 'learning_rate': 3.0179310524817707e-06, 'epoch': 0.75} 75%|███████▌ | 927/1230 [18:12:40<5:53:37, 70.02s/it]dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1466 [2024-07-31 20:24:43,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.21 | bwd_microstep: 5361.08 | bwd_inner_microstep: 4947.59 | bwd_allreduce_microstep: 413.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3833 [2024-07-31 20:24:51,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.97 | bwd_microstep: 5212.85 | bwd_inner_microstep: 5155.69 | bwd_allreduce_microstep: 57.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2225 [2024-07-31 20:25:00,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.87 | bwd_microstep: 5328.96 | bwd_inner_microstep: 4917.41 | bwd_allreduce_microstep: 411.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3721 [2024-07-31 20:25:09,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.72 | bwd_microstep: 5174.33 | bwd_inner_microstep: 5086.02 | bwd_allreduce_microstep: 88.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-31 20:25:18,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.58 | bwd_microstep: 5005.91 | bwd_inner_microstep: 4986.60 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-31 20:25:27,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.47 | bwd_microstep: 4898.26 | bwd_inner_microstep: 4878.93 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-31 20:25:35,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.11 | bwd_microstep: 5027.18 | bwd_inner_microstep: 4969.43 | bwd_allreduce_microstep: 57.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-31 20:25:44,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 20:25:44,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.77 | bwd_microstep: 4897.85 | bwd_inner_microstep: 4878.49 | bwd_allreduce_microstep: 19.28 | step_microstep: 181.85 [2024-07-31 20:25:44,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29050.61 | bwd: 40906.38 | bwd_inner: 39820.10 | bwd_allreduce: 1085.79 | step: 182.42 75%|███████▌ | 928/1230 [18:13:50<5:52:50, 70.10s/it] {'loss': 1.1439, 'learning_rate': 2.9991031981921026e-06, 'epoch': 0.75} 75%|███████▌ | 928/1230 [18:13:50<5:52:50, 70.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3835 [2024-07-31 20:25:53,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.81 | bwd_microstep: 5574.61 | bwd_inner_microstep: 5471.88 | bwd_allreduce_microstep: 102.67 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 1784 [2024-07-31 20:26:02,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.04 | bwd_microstep: 5355.41 | bwd_inner_microstep: 4944.18 | bwd_allreduce_microstep: 411.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3625 [2024-07-31 20:26:11,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.54 | bwd_microstep: 5105.01 | bwd_inner_microstep: 5017.82 | bwd_allreduce_microstep: 87.12 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3813 [2024-07-31 20:26:20,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.58 | bwd_microstep: 5045.06 | bwd_inner_microstep: 5025.51 | bwd_allreduce_microstep: 19.46 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 20:26:29,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.72 | bwd_microstep: 5171.02 | bwd_inner_microstep: 5097.57 | bwd_allreduce_microstep: 73.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3901 [2024-07-31 20:26:37,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.52 | bwd_microstep: 5267.73 | bwd_inner_microstep: 5215.56 | bwd_allreduce_microstep: 52.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 20:26:46,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.46 | bwd_microstep: 4995.29 | bwd_inner_microstep: 4974.77 | bwd_allreduce_microstep: 20.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 20:26:55,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 20:26:55,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.38 | bwd_microstep: 5136.79 | bwd_inner_microstep: 5067.56 | bwd_allreduce_microstep: 69.16 | step_microstep: 182.50 [2024-07-31 20:26:55,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29140.95 | bwd: 41650.89 | bwd_inner: 40814.79 | bwd_allreduce: 835.61 | step: 183.12 76%|███████▌ | 929/1230 [18:15:01<5:53:13, 70.41s/it] {'loss': 1.1148, 'learning_rate': 2.9803238920074784e-06, 'epoch': 0.76} 76%|███████▌ | 929/1230 [18:15:01<5:53:13, 70.41s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3852 [2024-07-31 20:27:04,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3796.28 | bwd_microstep: 5132.86 | bwd_inner_microstep: 5113.83 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3553 [2024-07-31 20:27:13,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.98 | bwd_microstep: 4982.19 | bwd_inner_microstep: 4923.10 | bwd_allreduce_microstep: 59.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-07-31 20:27:21,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.92 | bwd_microstep: 5183.27 | bwd_inner_microstep: 5102.56 | bwd_allreduce_microstep: 80.65 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-31 20:27:30,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.19 | bwd_microstep: 4962.21 | bwd_inner_microstep: 4930.72 | bwd_allreduce_microstep: 31.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-31 20:27:39,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.93 | bwd_microstep: 5160.74 | bwd_inner_microstep: 4760.37 | bwd_allreduce_microstep: 400.31 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-31 20:27:47,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.22 | bwd_microstep: 4976.68 | bwd_inner_microstep: 4592.89 | bwd_allreduce_microstep: 383.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 20:27:55,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3031.82 | bwd_microstep: 5032.73 | bwd_inner_microstep: 4646.66 | bwd_allreduce_microstep: 386.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 20:28:04,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 20:28:04,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.31 | bwd_microstep: 5170.98 | bwd_inner_microstep: 4768.28 | bwd_allreduce_microstep: 402.63 | step_microstep: 182.48 [2024-07-31 20:28:04,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27601.56 | bwd: 40601.66 | bwd_inner: 38838.35 | bwd_allreduce: 1762.84 | step: 183.17 76%|███████▌ | 930/1230 [18:16:10<5:49:13, 69.85s/it] {'loss': 1.1496, 'learning_rate': 2.961593264154038e-06, 'epoch': 0.76} 76%|███████▌ | 930/1230 [18:16:10<5:49:13, 69.85s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3527 [2024-07-31 20:28:12,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3227.06 | bwd_microstep: 5331.83 | bwd_inner_microstep: 5231.03 | bwd_allreduce_microstep: 100.73 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3750 [2024-07-31 20:28:21,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.55 | bwd_microstep: 5233.21 | bwd_inner_microstep: 5169.25 | bwd_allreduce_microstep: 63.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2233 [2024-07-31 20:28:30,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.97 | bwd_microstep: 5154.77 | bwd_inner_microstep: 4751.32 | bwd_allreduce_microstep: 403.38 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2836 [2024-07-31 20:28:39,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.76 | bwd_microstep: 5204.47 | bwd_inner_microstep: 4800.64 | bwd_allreduce_microstep: 403.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-31 20:28:47,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.65 | bwd_microstep: 5129.90 | bwd_inner_microstep: 5052.92 | bwd_allreduce_microstep: 76.91 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3695 [2024-07-31 20:28:56,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.80 | bwd_microstep: 5050.14 | bwd_inner_microstep: 4976.87 | bwd_allreduce_microstep: 73.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-31 20:29:05,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.07 | bwd_microstep: 5165.55 | bwd_inner_microstep: 4762.03 | bwd_allreduce_microstep: 403.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 20:29:13,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 20:29:13,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3424.69 | bwd_microstep: 4989.28 | bwd_inner_microstep: 4938.04 | bwd_allreduce_microstep: 51.18 | step_microstep: 182.15 [2024-07-31 20:29:13,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28083.46 | bwd: 41259.13 | bwd_inner: 39682.03 | bwd_allreduce: 1576.63 | step: 182.73 76%|███████▌ | 931/1230 [18:17:19<5:47:48, 69.80s/it] {'loss': 1.184, 'learning_rate': 2.9429114445203453e-06, 'epoch': 0.76} 76%|███████▌ | 931/1230 [18:17:19<5:47:48, 69.80s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2319 [2024-07-31 20:29:23,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.11 | bwd_microstep: 5743.60 | bwd_inner_microstep: 5330.74 | bwd_allreduce_microstep: 412.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 20:29:32,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.67 | bwd_microstep: 5534.87 | bwd_inner_microstep: 5481.26 | bwd_allreduce_microstep: 53.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3862 [2024-07-31 20:29:41,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.13 | bwd_microstep: 5109.78 | bwd_inner_microstep: 5090.46 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-31 20:29:49,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.87 | bwd_microstep: 5029.73 | bwd_inner_microstep: 4990.82 | bwd_allreduce_microstep: 38.85 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 20:29:58,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.43 | bwd_microstep: 5062.66 | bwd_inner_microstep: 5019.47 | bwd_allreduce_microstep: 43.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 20:30:07,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.40 | bwd_microstep: 4923.43 | bwd_inner_microstep: 4901.16 | bwd_allreduce_microstep: 22.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3678 [2024-07-31 20:30:15,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.74 | bwd_microstep: 4876.86 | bwd_inner_microstep: 4857.52 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 20:30:24,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 20:30:24,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.03 | bwd_microstep: 5060.39 | bwd_inner_microstep: 5001.08 | bwd_allreduce_microstep: 59.24 | step_microstep: 181.30 [2024-07-31 20:30:24,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29289.28 | bwd: 41341.30 | bwd_inner: 40672.46 | bwd_allreduce: 668.36 | step: 181.90 76%|███████▌ | 932/1230 [18:18:30<5:48:23, 70.15s/it] {'loss': 1.153, 'learning_rate': 2.924278562656514e-06, 'epoch': 0.76} 76%|███████▌ | 932/1230 [18:18:30<5:48:23, 70.15s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 20:30:33,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.70 | bwd_microstep: 5296.29 | bwd_inner_microstep: 5221.70 | bwd_allreduce_microstep: 74.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-31 20:30:41,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3165.90 | bwd_microstep: 4638.77 | bwd_inner_microstep: 4619.43 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 20:30:50,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.23 | bwd_microstep: 5102.99 | bwd_inner_microstep: 5035.29 | bwd_allreduce_microstep: 67.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 20:30:58,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.76 | bwd_microstep: 5137.61 | bwd_inner_microstep: 5086.53 | bwd_allreduce_microstep: 51.01 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 20:31:07,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.69 | bwd_microstep: 5194.57 | bwd_inner_microstep: 4789.30 | bwd_allreduce_microstep: 405.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-07-31 20:31:16,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.60 | bwd_microstep: 5211.84 | bwd_inner_microstep: 4807.24 | bwd_allreduce_microstep: 404.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2137 [2024-07-31 20:31:25,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3464.95 | bwd_microstep: 5035.71 | bwd_inner_microstep: 4642.59 | bwd_allreduce_microstep: 393.05 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 598 [2024-07-31 20:31:33,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 20:31:33,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3407.39 | bwd_microstep: 5103.14 | bwd_inner_microstep: 4710.49 | bwd_allreduce_microstep: 392.58 | step_microstep: 182.54 [2024-07-31 20:31:33,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27932.13 | bwd: 40720.90 | bwd_inner: 38912.52 | bwd_allreduce: 1807.89 | step: 183.12 76%|███████▌ | 933/1230 [18:19:39<5:45:29, 69.80s/it] {'loss': 1.1785, 'learning_rate': 2.90569474777329e-06, 'epoch': 0.76} 76%|███████▌ | 933/1230 [18:19:39<5:45:29, 69.80s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2360 [2024-07-31 20:31:43,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.92 | bwd_microstep: 5595.65 | bwd_inner_microstep: 5164.99 | bwd_allreduce_microstep: 430.58 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2007 [2024-07-31 20:31:51,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3027.00 | bwd_microstep: 5024.81 | bwd_inner_microstep: 4639.99 | bwd_allreduce_microstep: 384.75 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-31 20:31:59,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.78 | bwd_microstep: 5136.53 | bwd_inner_microstep: 4739.10 | bwd_allreduce_microstep: 397.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3665 [2024-07-31 20:32:08,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.04 | bwd_microstep: 5052.29 | bwd_inner_microstep: 5007.22 | bwd_allreduce_microstep: 45.00 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 20:32:17,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.14 | bwd_microstep: 5167.17 | bwd_inner_microstep: 5088.52 | bwd_allreduce_microstep: 78.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3709 [2024-07-31 20:32:25,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.56 | bwd_microstep: 4998.82 | bwd_inner_microstep: 4935.70 | bwd_allreduce_microstep: 63.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 20:32:34,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.18 | bwd_microstep: 4879.18 | bwd_inner_microstep: 4859.90 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 20:32:43,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 20:32:43,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.44 | bwd_microstep: 5040.03 | bwd_inner_microstep: 4648.99 | bwd_allreduce_microstep: 390.97 | step_microstep: 183.01 [2024-07-31 20:32:43,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28203.94 | bwd: 40894.45 | bwd_inner: 39084.35 | bwd_allreduce: 1809.61 | step: 183.70 76%|███████▌ | 934/1230 [18:20:49<5:43:47, 69.69s/it] {'loss': 1.1467, 'learning_rate': 2.8871601287411634e-06, 'epoch': 0.76} 76%|███████▌ | 934/1230 [18:20:49<5:43:47, 69.69s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 20:32:52,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.00 | bwd_microstep: 5231.19 | bwd_inner_microstep: 5210.56 | bwd_allreduce_microstep: 20.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2277 [2024-07-31 20:33:01,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.94 | bwd_microstep: 5356.87 | bwd_inner_microstep: 4942.16 | bwd_allreduce_microstep: 414.64 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-31 20:33:09,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.77 | bwd_microstep: 5057.39 | bwd_inner_microstep: 5030.59 | bwd_allreduce_microstep: 26.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-31 20:33:18,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.25 | bwd_microstep: 5174.50 | bwd_inner_microstep: 5094.30 | bwd_allreduce_microstep: 80.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-31 20:33:26,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3033.81 | bwd_microstep: 4969.26 | bwd_inner_microstep: 4584.13 | bwd_allreduce_microstep: 385.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 20:33:35,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.32 | bwd_microstep: 4981.86 | bwd_inner_microstep: 4962.42 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 20:33:44,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.65 | bwd_microstep: 4951.56 | bwd_inner_microstep: 4906.34 | bwd_allreduce_microstep: 45.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 20:33:52,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 20:33:52,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.97 | bwd_microstep: 4992.86 | bwd_inner_microstep: 4943.49 | bwd_allreduce_microstep: 49.31 | step_microstep: 181.93 [2024-07-31 20:33:52,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28511.62 | bwd: 40715.48 | bwd_inner: 39673.95 | bwd_allreduce: 1041.03 | step: 182.51 76%|███████▌ | 935/1230 [18:21:58<5:42:26, 69.65s/it] {'loss': 1.1201, 'learning_rate': 2.868674834089471e-06, 'epoch': 0.76} 76%|███████▌ | 935/1230 [18:21:58<5:42:26, 69.65s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4088 [2024-07-31 20:34:02,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3892.36 | bwd_microstep: 5456.61 | bwd_inner_microstep: 5420.56 | bwd_allreduce_microstep: 35.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3907 [2024-07-31 20:34:11,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3798.91 | bwd_microstep: 5163.06 | bwd_inner_microstep: 5143.64 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3812 [2024-07-31 20:34:19,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.01 | bwd_microstep: 5047.85 | bwd_inner_microstep: 5028.56 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-31 20:34:28,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.32 | bwd_microstep: 5101.90 | bwd_inner_microstep: 5057.42 | bwd_allreduce_microstep: 44.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-31 20:34:37,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.29 | bwd_microstep: 5128.91 | bwd_inner_microstep: 5076.12 | bwd_allreduce_microstep: 52.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-31 20:34:46,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.31 | bwd_microstep: 5183.75 | bwd_inner_microstep: 5106.35 | bwd_allreduce_microstep: 77.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 20:34:54,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.26 | bwd_microstep: 5077.83 | bwd_inner_microstep: 5032.00 | bwd_allreduce_microstep: 45.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 20:35:03,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 20:35:03,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.23 | bwd_microstep: 4888.33 | bwd_inner_microstep: 4868.94 | bwd_allreduce_microstep: 19.32 | step_microstep: 181.59 [2024-07-31 20:35:03,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29504.60 | bwd: 41048.23 | bwd_inner: 40733.55 | bwd_allreduce: 314.19 | step: 182.17 76%|███████▌ | 936/1230 [18:23:09<5:43:06, 70.02s/it] {'loss': 1.0852, 'learning_rate': 2.850238992005514e-06, 'epoch': 0.76} 76%|███████▌ | 936/1230 [18:23:09<5:43:06, 70.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3927 [2024-07-31 20:35:12,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.39 | bwd_microstep: 5365.61 | bwd_inner_microstep: 5311.77 | bwd_allreduce_microstep: 53.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3882 [2024-07-31 20:35:21,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.79 | bwd_microstep: 5127.62 | bwd_inner_microstep: 5108.24 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2219 [2024-07-31 20:35:30,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.73 | bwd_microstep: 5238.01 | bwd_inner_microstep: 4831.39 | bwd_allreduce_microstep: 406.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 20:35:39,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.22 | bwd_microstep: 5182.41 | bwd_inner_microstep: 5125.26 | bwd_allreduce_microstep: 57.08 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3736 [2024-07-31 20:35:47,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.33 | bwd_microstep: 4979.30 | bwd_inner_microstep: 4946.07 | bwd_allreduce_microstep: 33.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-31 20:35:56,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.35 | bwd_microstep: 5124.37 | bwd_inner_microstep: 5053.06 | bwd_allreduce_microstep: 71.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 20:36:05,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.40 | bwd_microstep: 4991.98 | bwd_inner_microstep: 4942.41 | bwd_allreduce_microstep: 49.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-31 20:36:13,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 20:36:13,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.13 | bwd_microstep: 4987.70 | bwd_inner_microstep: 4953.28 | bwd_allreduce_microstep: 34.36 | step_microstep: 208.93 [2024-07-31 20:36:13,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28942.24 | bwd: 40996.98 | bwd_inner: 40271.42 | bwd_allreduce: 725.08 | step: 209.52 76%|███████▌ | 937/1230 [18:24:19<5:42:20, 70.10s/it] {'loss': 1.1701, 'learning_rate': 2.8318527303336465e-06, 'epoch': 0.76} 76%|███████▌ | 937/1230 [18:24:19<5:42:20, 70.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3940 [2024-07-31 20:36:23,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.00 | bwd_microstep: 5526.31 | bwd_inner_microstep: 5446.83 | bwd_allreduce_microstep: 79.42 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3933 [2024-07-31 20:36:31,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3304.38 | bwd_microstep: 4970.51 | bwd_inner_microstep: 4951.14 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3811 [2024-07-31 20:36:40,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.35 | bwd_microstep: 5032.40 | bwd_inner_microstep: 5013.14 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-31 20:36:48,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.96 | bwd_microstep: 5120.60 | bwd_inner_microstep: 5042.29 | bwd_allreduce_microstep: 78.24 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-07-31 20:36:57,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.52 | bwd_microstep: 5115.09 | bwd_inner_microstep: 5073.76 | bwd_allreduce_microstep: 41.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-31 20:37:05,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3046.82 | bwd_microstep: 5047.53 | bwd_inner_microstep: 4657.07 | bwd_allreduce_microstep: 390.39 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2125 [2024-07-31 20:37:14,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.66 | bwd_microstep: 5142.26 | bwd_inner_microstep: 4743.25 | bwd_allreduce_microstep: 398.94 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-31 20:37:23,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 20:37:23,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.26 | bwd_microstep: 4985.04 | bwd_inner_microstep: 4965.69 | bwd_allreduce_microstep: 19.28 | step_microstep: 182.57 [2024-07-31 20:37:23,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28277.87 | bwd: 40939.72 | bwd_inner: 39893.12 | bwd_allreduce: 1046.12 | step: 183.30 76%|███████▋ | 938/1230 [18:25:29<5:40:21, 69.94s/it] {'loss': 1.1229, 'learning_rate': 2.81351617657442e-06, 'epoch': 0.76} 76%|███████▋ | 938/1230 [18:25:29<5:40:21, 69.94s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-31 20:37:32,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.31 | bwd_microstep: 5581.85 | bwd_inner_microstep: 5481.08 | bwd_allreduce_microstep: 100.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3846 [2024-07-31 20:37:41,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3795.61 | bwd_microstep: 5166.53 | bwd_inner_microstep: 5139.39 | bwd_allreduce_microstep: 27.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-31 20:37:50,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.71 | bwd_microstep: 5115.41 | bwd_inner_microstep: 5088.87 | bwd_allreduce_microstep: 26.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 20:37:58,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.84 | bwd_microstep: 4885.77 | bwd_inner_microstep: 4841.38 | bwd_allreduce_microstep: 44.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 20:38:06,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.76 | bwd_microstep: 4868.97 | bwd_inner_microstep: 4827.41 | bwd_allreduce_microstep: 41.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3679 [2024-07-31 20:38:15,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.05 | bwd_microstep: 5168.02 | bwd_inner_microstep: 5083.16 | bwd_allreduce_microstep: 84.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-31 20:38:24,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.71 | bwd_microstep: 4955.83 | bwd_inner_microstep: 4922.90 | bwd_allreduce_microstep: 32.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 20:38:33,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 20:38:33,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.13 | bwd_microstep: 5075.21 | bwd_inner_microstep: 5018.79 | bwd_allreduce_microstep: 56.35 | step_microstep: 181.62 [2024-07-31 20:38:33,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28503.03 | bwd: 40817.57 | bwd_inner: 40402.91 | bwd_allreduce: 414.19 | step: 182.19 76%|███████▋ | 939/1230 [18:26:39<5:38:47, 69.85s/it] {'loss': 1.0841, 'learning_rate': 2.795229457883678e-06, 'epoch': 0.76} 76%|███████▋ | 939/1230 [18:26:39<5:38:47, 69.85s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4070 [2024-07-31 20:38:42,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.27 | bwd_microstep: 5391.32 | bwd_inner_microstep: 5354.74 | bwd_allreduce_microstep: 36.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3881 [2024-07-31 20:38:51,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.38 | bwd_microstep: 5084.86 | bwd_inner_microstep: 5048.85 | bwd_allreduce_microstep: 35.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3587 [2024-07-31 20:38:59,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.71 | bwd_microstep: 5113.23 | bwd_inner_microstep: 5024.82 | bwd_allreduce_microstep: 88.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3579 [2024-07-31 20:39:08,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.86 | bwd_microstep: 5129.08 | bwd_inner_microstep: 5027.25 | bwd_allreduce_microstep: 101.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-31 20:39:17,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.20 | bwd_microstep: 5054.00 | bwd_inner_microstep: 4661.75 | bwd_allreduce_microstep: 392.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-31 20:39:25,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.13 | bwd_microstep: 4831.65 | bwd_inner_microstep: 4790.86 | bwd_allreduce_microstep: 40.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 20:39:33,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.77 | bwd_microstep: 5197.59 | bwd_inner_microstep: 4795.00 | bwd_allreduce_microstep: 402.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 20:39:42,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 20:39:42,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.45 | bwd_microstep: 5061.18 | bwd_inner_microstep: 5001.72 | bwd_allreduce_microstep: 59.40 | step_microstep: 181.61 [2024-07-31 20:39:42,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28338.68 | bwd: 40862.89 | bwd_inner: 39704.91 | bwd_allreduce: 1157.48 | step: 182.19 76%|███████▋ | 940/1230 [18:27:48<5:37:09, 69.76s/it] {'loss': 1.1177, 'learning_rate': 2.776992701071678e-06, 'epoch': 0.76} 76%|███████▋ | 940/1230 [18:27:48<5:37:09, 69.76s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2381 [2024-07-31 20:39:51,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.93 | bwd_microstep: 5559.54 | bwd_inner_microstep: 5133.81 | bwd_allreduce_microstep: 425.66 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3912 [2024-07-31 20:40:00,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.25 | bwd_microstep: 5180.27 | bwd_inner_microstep: 5140.92 | bwd_allreduce_microstep: 39.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3864 [2024-07-31 20:40:09,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.67 | bwd_microstep: 5168.58 | bwd_inner_microstep: 5121.59 | bwd_allreduce_microstep: 46.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-31 20:40:18,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.08 | bwd_microstep: 5115.10 | bwd_inner_microstep: 5067.47 | bwd_allreduce_microstep: 47.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 20:40:26,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3054.99 | bwd_microstep: 5045.92 | bwd_inner_microstep: 4658.73 | bwd_allreduce_microstep: 387.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 20:40:34,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.91 | bwd_microstep: 4878.50 | bwd_inner_microstep: 4859.06 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 20:40:43,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.46 | bwd_microstep: 4984.76 | bwd_inner_microstep: 4932.58 | bwd_allreduce_microstep: 52.11 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3663 [2024-07-31 20:40:52,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 20:40:52,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.68 | bwd_microstep: 5028.89 | bwd_inner_microstep: 4962.62 | bwd_allreduce_microstep: 66.21 | step_microstep: 181.76 [2024-07-31 20:40:52,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28363.85 | bwd: 40961.53 | bwd_inner: 39876.72 | bwd_allreduce: 1084.34 | step: 182.35 77%|███████▋ | 941/1230 [18:28:58<5:35:51, 69.73s/it] {'loss': 1.1815, 'learning_rate': 2.7588060326022205e-06, 'epoch': 0.76} 77%|███████▋ | 941/1230 [18:28:58<5:35:51, 69.73s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3958 [2024-07-31 20:41:01,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3789.89 | bwd_microstep: 5175.48 | bwd_inner_microstep: 5156.33 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-31 20:41:10,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.35 | bwd_microstep: 5037.61 | bwd_inner_microstep: 5014.39 | bwd_allreduce_microstep: 23.15 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3773 [2024-07-31 20:41:18,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.44 | bwd_microstep: 4862.32 | bwd_inner_microstep: 4841.28 | bwd_allreduce_microstep: 20.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3795 [2024-07-31 20:41:27,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.23 | bwd_microstep: 5160.95 | bwd_inner_microstep: 5114.87 | bwd_allreduce_microstep: 46.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-31 20:41:36,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.82 | bwd_microstep: 5117.28 | bwd_inner_microstep: 5066.32 | bwd_allreduce_microstep: 50.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 20:41:44,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.60 | bwd_microstep: 5100.37 | bwd_inner_microstep: 4704.25 | bwd_allreduce_microstep: 396.05 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 20:41:53,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.05 | bwd_microstep: 4978.54 | bwd_inner_microstep: 4945.00 | bwd_allreduce_microstep: 33.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 20:42:01,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 20:42:01,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3234.20 | bwd_microstep: 4797.06 | bwd_inner_microstep: 4777.78 | bwd_allreduce_microstep: 19.21 | step_microstep: 183.07 [2024-07-31 20:42:01,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28773.49 | bwd: 40229.60 | bwd_inner: 39620.15 | bwd_allreduce: 608.96 | step: 183.78 77%|███████▋ | 942/1230 [18:30:07<5:34:07, 69.61s/it] {'loss': 1.0721, 'learning_rate': 2.740669578591755e-06, 'epoch': 0.77} 77%|███████▋ | 942/1230 [18:30:07<5:34:07, 69.61s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2033 [2024-07-31 20:42:10,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.36 | bwd_microstep: 5423.97 | bwd_inner_microstep: 5005.34 | bwd_allreduce_microstep: 418.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3816 [2024-07-31 20:42:19,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3780.49 | bwd_microstep: 5083.48 | bwd_inner_microstep: 5057.59 | bwd_allreduce_microstep: 25.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 20:42:28,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.08 | bwd_microstep: 5209.36 | bwd_inner_microstep: 5127.83 | bwd_allreduce_microstep: 81.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-31 20:42:37,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.99 | bwd_microstep: 5139.62 | bwd_inner_microstep: 5065.39 | bwd_allreduce_microstep: 74.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-31 20:42:45,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.74 | bwd_microstep: 5003.37 | bwd_inner_microstep: 4984.05 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 20:42:54,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.23 | bwd_microstep: 5065.53 | bwd_inner_microstep: 5008.45 | bwd_allreduce_microstep: 57.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 20:43:03,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.82 | bwd_microstep: 4957.47 | bwd_inner_microstep: 4910.13 | bwd_allreduce_microstep: 47.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 20:43:11,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 20:43:11,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.59 | bwd_microstep: 5012.43 | bwd_inner_microstep: 4958.25 | bwd_allreduce_microstep: 54.11 | step_microstep: 181.44 [2024-07-31 20:43:11,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28983.21 | bwd: 40895.22 | bwd_inner: 40116.97 | bwd_allreduce: 777.76 | step: 182.01 77%|███████▋ | 943/1230 [18:31:17<5:33:49, 69.79s/it] {'loss': 1.1371, 'learning_rate': 2.722583464808525e-06, 'epoch': 0.77} 77%|███████▋ | 943/1230 [18:31:17<5:33:49, 69.79s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4013 [2024-07-31 20:43:20,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3332.69 | bwd_microstep: 5066.36 | bwd_inner_microstep: 5047.32 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3819 [2024-07-31 20:43:29,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.33 | bwd_microstep: 5068.41 | bwd_inner_microstep: 5047.24 | bwd_allreduce_microstep: 21.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-31 20:43:37,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.62 | bwd_microstep: 5047.06 | bwd_inner_microstep: 5025.00 | bwd_allreduce_microstep: 22.00 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2087 [2024-07-31 20:43:46,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.26 | bwd_microstep: 5235.22 | bwd_inner_microstep: 4826.79 | bwd_allreduce_microstep: 408.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 20:43:55,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.77 | bwd_microstep: 5382.04 | bwd_inner_microstep: 5279.15 | bwd_allreduce_microstep: 102.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 20:44:04,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.31 | bwd_microstep: 5022.48 | bwd_inner_microstep: 4995.70 | bwd_allreduce_microstep: 26.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3796 [2024-07-31 20:44:13,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.85 | bwd_microstep: 5355.90 | bwd_inner_microstep: 5189.23 | bwd_allreduce_microstep: 166.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3703 [2024-07-31 20:44:22,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.44 [2024-07-31 20:44:22,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.26 | bwd_microstep: 5077.00 | bwd_inner_microstep: 5003.66 | bwd_allreduce_microstep: 73.27 | step_microstep: 181.24 [2024-07-31 20:44:22,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29051.99 | bwd: 41254.45 | bwd_inner: 40414.03 | bwd_allreduce: 839.94 | step: 181.83 77%|███████▋ | 944/1230 [18:32:28<5:33:52, 70.05s/it] {'loss': 1.1468, 'learning_rate': 2.7045478166716843e-06, 'epoch': 0.77} 77%|███████▋ | 944/1230 [18:32:28<5:33:52, 70.05s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-31 20:44:31,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.84 | bwd_microstep: 5469.05 | bwd_inner_microstep: 5324.85 | bwd_allreduce_microstep: 144.13 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2216 [2024-07-31 20:44:40,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.96 | bwd_microstep: 5239.09 | bwd_inner_microstep: 4832.78 | bwd_allreduce_microstep: 406.24 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3854 [2024-07-31 20:44:49,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.99 | bwd_microstep: 5067.65 | bwd_inner_microstep: 5017.96 | bwd_allreduce_microstep: 49.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 20:44:57,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.29 | bwd_microstep: 5068.03 | bwd_inner_microstep: 5008.18 | bwd_allreduce_microstep: 59.78 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 20:45:06,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.44 | bwd_microstep: 4981.78 | bwd_inner_microstep: 4962.34 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 20:45:15,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3476.47 | bwd_microstep: 5056.89 | bwd_inner_microstep: 4664.37 | bwd_allreduce_microstep: 392.44 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-31 20:45:23,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3304.69 | bwd_microstep: 4902.24 | bwd_inner_microstep: 4861.02 | bwd_allreduce_microstep: 41.15 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3660 [2024-07-31 20:45:32,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 20:45:32,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.55 | bwd_microstep: 5041.72 | bwd_inner_microstep: 4995.90 | bwd_allreduce_microstep: 45.75 | step_microstep: 181.51 [2024-07-31 20:45:32,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28487.12 | bwd: 40826.43 | bwd_inner: 39667.34 | bwd_allreduce: 1158.58 | step: 182.09 77%|███████▋ | 945/1230 [18:33:38<5:32:10, 69.93s/it] {'loss': 1.1387, 'learning_rate': 2.686562759250433e-06, 'epoch': 0.77} 77%|███████▋ | 945/1230 [18:33:38<5:32:10, 69.93s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3986 [2024-07-31 20:45:41,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.44 | bwd_microstep: 5363.01 | bwd_inner_microstep: 5322.48 | bwd_allreduce_microstep: 40.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3953 [2024-07-31 20:45:50,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3805.14 | bwd_microstep: 5179.84 | bwd_inner_microstep: 5160.45 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2318 [2024-07-31 20:45:59,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.66 | bwd_microstep: 5396.72 | bwd_inner_microstep: 4978.30 | bwd_allreduce_microstep: 418.36 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-31 20:46:08,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.40 | bwd_microstep: 5045.61 | bwd_inner_microstep: 5018.61 | bwd_allreduce_microstep: 26.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 20:46:16,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.85 | bwd_microstep: 5005.00 | bwd_inner_microstep: 4951.66 | bwd_allreduce_microstep: 53.27 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2146 [2024-07-31 20:46:25,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.72 | bwd_microstep: 5219.78 | bwd_inner_microstep: 4816.06 | bwd_allreduce_microstep: 403.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 20:46:33,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.61 | bwd_microstep: 4781.17 | bwd_inner_microstep: 4748.83 | bwd_allreduce_microstep: 32.27 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 20:46:42,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 20:46:42,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3440.19 | bwd_microstep: 5015.61 | bwd_inner_microstep: 4626.95 | bwd_allreduce_microstep: 388.59 | step_microstep: 181.77 [2024-07-31 20:46:42,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28667.92 | bwd: 41006.72 | bwd_inner: 39623.27 | bwd_allreduce: 1382.97 | step: 182.48 77%|███████▋ | 946/1230 [18:34:48<5:31:06, 69.95s/it] {'loss': 1.1026, 'learning_rate': 2.668628417263135e-06, 'epoch': 0.77} 77%|███████▋ | 946/1230 [18:34:48<5:31:06, 69.95s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3778 [2024-07-31 20:46:51,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.12 | bwd_microstep: 5271.17 | bwd_inner_microstep: 5206.69 | bwd_allreduce_microstep: 64.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3970 [2024-07-31 20:47:00,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3827.15 | bwd_microstep: 5239.77 | bwd_inner_microstep: 5220.38 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3766 [2024-07-31 20:47:08,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.95 | bwd_microstep: 5092.01 | bwd_inner_microstep: 5057.52 | bwd_allreduce_microstep: 34.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-31 20:47:17,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.86 | bwd_microstep: 5120.34 | bwd_inner_microstep: 5050.41 | bwd_allreduce_microstep: 69.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 20:47:26,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.80 | bwd_microstep: 5110.77 | bwd_inner_microstep: 5040.45 | bwd_allreduce_microstep: 70.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-31 20:47:35,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.47 | bwd_microstep: 5068.39 | bwd_inner_microstep: 5010.69 | bwd_allreduce_microstep: 57.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 20:47:43,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.73 | bwd_microstep: 5188.43 | bwd_inner_microstep: 4786.47 | bwd_allreduce_microstep: 401.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 20:47:52,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 20:47:52,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.79 | bwd_microstep: 5107.57 | bwd_inner_microstep: 5039.13 | bwd_allreduce_microstep: 68.37 | step_microstep: 181.38 [2024-07-31 20:47:52,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29003.79 | bwd: 41198.42 | bwd_inner: 40411.67 | bwd_allreduce: 786.26 | step: 181.95 77%|███████▋ | 947/1230 [18:35:58<5:30:46, 70.13s/it] {'loss': 1.1647, 'learning_rate': 2.6507449150764852e-06, 'epoch': 0.77} 77%|███████▋ | 947/1230 [18:35:58<5:30:46, 70.13s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3932 [2024-07-31 20:48:01,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.74 | bwd_microstep: 5271.19 | bwd_inner_microstep: 5209.15 | bwd_allreduce_microstep: 61.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3858 [2024-07-31 20:48:10,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.51 | bwd_microstep: 5282.89 | bwd_inner_microstep: 5220.89 | bwd_allreduce_microstep: 61.93 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1273 [2024-07-31 20:48:19,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.95 | bwd_microstep: 5230.33 | bwd_inner_microstep: 4827.17 | bwd_allreduce_microstep: 403.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-31 20:48:27,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.95 | bwd_microstep: 4829.86 | bwd_inner_microstep: 4804.46 | bwd_allreduce_microstep: 25.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-31 20:48:36,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.28 | bwd_microstep: 4960.57 | bwd_inner_microstep: 4930.75 | bwd_allreduce_microstep: 29.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-31 20:48:44,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.92 | bwd_microstep: 5005.51 | bwd_inner_microstep: 4986.13 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 20:48:53,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.73 | bwd_microstep: 5030.14 | bwd_inner_microstep: 4973.89 | bwd_allreduce_microstep: 56.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 20:49:02,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 20:49:02,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.34 | bwd_microstep: 5056.24 | bwd_inner_microstep: 4996.48 | bwd_allreduce_microstep: 59.69 | step_microstep: 181.65 [2024-07-31 20:49:02,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28553.32 | bwd: 40666.70 | bwd_inner: 39948.88 | bwd_allreduce: 717.34 | step: 182.24 77%|███████▋ | 948/1230 [18:37:08<5:28:47, 69.96s/it] {'loss': 1.1644, 'learning_rate': 2.632912376704607e-06, 'epoch': 0.77} 77%|███████▋ | 948/1230 [18:37:08<5:28:47, 69.96s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 20:49:11,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3891.97 | bwd_microstep: 5459.40 | bwd_inner_microstep: 5431.97 | bwd_allreduce_microstep: 27.36 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-31 20:49:20,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3316.07 | bwd_microstep: 5134.04 | bwd_inner_microstep: 5062.83 | bwd_allreduce_microstep: 71.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3812 [2024-07-31 20:49:28,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.78 | bwd_microstep: 5030.76 | bwd_inner_microstep: 5011.35 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-31 20:49:37,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.99 | bwd_microstep: 5038.70 | bwd_inner_microstep: 5013.86 | bwd_allreduce_microstep: 24.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-31 20:49:45,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3005.68 | bwd_microstep: 4930.02 | bwd_inner_microstep: 4551.07 | bwd_allreduce_microstep: 378.88 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 20:49:54,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.29 | bwd_microstep: 4968.96 | bwd_inner_microstep: 4927.76 | bwd_allreduce_microstep: 41.13 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3678 [2024-07-31 20:50:03,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.42 | bwd_microstep: 5220.30 | bwd_inner_microstep: 5119.76 | bwd_allreduce_microstep: 100.47 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3699 [2024-07-31 20:50:12,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 20:50:12,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.79 | bwd_microstep: 5198.31 | bwd_inner_microstep: 5104.76 | bwd_allreduce_microstep: 93.49 | step_microstep: 181.89 [2024-07-31 20:50:12,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28472.89 | bwd: 40980.48 | bwd_inner: 40223.30 | bwd_allreduce: 756.70 | step: 182.48 77%|███████▋ | 949/1230 [18:38:17<5:27:23, 69.90s/it] {'loss': 1.1347, 'learning_rate': 2.615130925808228e-06, 'epoch': 0.77} 77%|███████▋ | 949/1230 [18:38:17<5:27:23, 69.90s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3936 [2024-07-31 20:50:21,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3865.35 | bwd_microstep: 5455.91 | bwd_inner_microstep: 5400.14 | bwd_allreduce_microstep: 55.72 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2302 [2024-07-31 20:50:29,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3123.32 | bwd_microstep: 5328.26 | bwd_inner_microstep: 4919.26 | bwd_allreduce_microstep: 408.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-31 20:50:38,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.69 | bwd_microstep: 5201.12 | bwd_inner_microstep: 4796.68 | bwd_allreduce_microstep: 404.37 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3624 [2024-07-31 20:50:47,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.90 | bwd_microstep: 5153.14 | bwd_inner_microstep: 5064.38 | bwd_allreduce_microstep: 88.69 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 20:50:56,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.42 | bwd_microstep: 5047.80 | bwd_inner_microstep: 5019.21 | bwd_allreduce_microstep: 28.52 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 20:51:05,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.06 | bwd_microstep: 5277.37 | bwd_inner_microstep: 4867.70 | bwd_allreduce_microstep: 409.60 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 20:51:13,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3399.57 | bwd_microstep: 4928.03 | bwd_inner_microstep: 4886.32 | bwd_allreduce_microstep: 41.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-31 20:51:22,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 20:51:22,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.20 | bwd_microstep: 4886.37 | bwd_inner_microstep: 4866.70 | bwd_allreduce_microstep: 19.60 | step_microstep: 183.62 [2024-07-31 20:51:22,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28560.41 | bwd: 41277.99 | bwd_inner: 39820.32 | bwd_allreduce: 1457.17 | step: 184.33 77%|███████▋ | 950/1230 [18:39:28<5:26:35, 69.99s/it] {'loss': 1.1401, 'learning_rate': 2.5974006856937917e-06, 'epoch': 0.77} 77%|███████▋ | 950/1230 [18:39:28<5:26:35, 69.99s/it]dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 4096 [2024-07-31 20:51:31,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3815.20 | bwd_microstep: 5316.80 | bwd_inner_microstep: 5297.75 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3823 [2024-07-31 20:51:40,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.39 | bwd_microstep: 5212.23 | bwd_inner_microstep: 5171.02 | bwd_allreduce_microstep: 41.15 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3802 [2024-07-31 20:51:49,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.81 | bwd_microstep: 5042.84 | bwd_inner_microstep: 5022.38 | bwd_allreduce_microstep: 20.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 20:51:57,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.44 | bwd_microstep: 4802.79 | bwd_inner_microstep: 4769.49 | bwd_allreduce_microstep: 33.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-07-31 20:52:05,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.47 | bwd_microstep: 5117.69 | bwd_inner_microstep: 5072.24 | bwd_allreduce_microstep: 45.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 20:52:14,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3307.18 | bwd_microstep: 4911.11 | bwd_inner_microstep: 4871.06 | bwd_allreduce_microstep: 39.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-31 20:52:22,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.20 | bwd_microstep: 5111.98 | bwd_inner_microstep: 4715.16 | bwd_allreduce_microstep: 396.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 20:52:31,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 20:52:31,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.48 | bwd_microstep: 4997.99 | bwd_inner_microstep: 4947.43 | bwd_allreduce_microstep: 50.48 | step_microstep: 181.32 [2024-07-31 20:52:31,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28439.08 | bwd: 40513.41 | bwd_inner: 39866.47 | bwd_allreduce: 646.44 | step: 181.92 77%|███████▋ | 951/1230 [18:40:37<5:24:27, 69.78s/it] {'loss': 1.1395, 'learning_rate': 2.5797217793126373e-06, 'epoch': 0.77} 77%|███████▋ | 951/1230 [18:40:37<5:24:27, 69.78s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3842 [2024-07-31 20:52:40,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.21 | bwd_microstep: 5188.90 | bwd_inner_microstep: 5147.08 | bwd_allreduce_microstep: 41.75 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3859 [2024-07-31 20:52:49,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.48 | bwd_microstep: 5238.90 | bwd_inner_microstep: 5166.33 | bwd_allreduce_microstep: 72.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3897 [2024-07-31 20:52:58,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.30 | bwd_microstep: 5230.87 | bwd_inner_microstep: 5180.76 | bwd_allreduce_microstep: 50.05 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2231 [2024-07-31 20:53:06,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.60 | bwd_microstep: 5186.42 | bwd_inner_microstep: 4781.23 | bwd_allreduce_microstep: 405.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-31 20:53:15,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.59 | bwd_microstep: 5125.68 | bwd_inner_microstep: 5073.66 | bwd_allreduce_microstep: 51.96 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3644 [2024-07-31 20:53:24,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.80 | bwd_microstep: 5075.08 | bwd_inner_microstep: 4987.22 | bwd_allreduce_microstep: 87.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2135 [2024-07-31 20:53:32,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.15 | bwd_microstep: 5053.20 | bwd_inner_microstep: 4660.92 | bwd_allreduce_microstep: 392.21 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2167 [2024-07-31 20:53:41,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 20:53:41,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.57 | bwd_microstep: 5239.90 | bwd_inner_microstep: 4834.27 | bwd_allreduce_microstep: 405.56 | step_microstep: 184.09 [2024-07-31 20:53:41,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28615.59 | bwd: 41338.94 | bwd_inner: 39831.40 | bwd_allreduce: 1507.05 | step: 184.67 77%|███████▋ | 952/1230 [18:41:47<5:24:00, 69.93s/it] {'loss': 1.1204, 'learning_rate': 2.5620943292601074e-06, 'epoch': 0.77} 77%|███████▋ | 952/1230 [18:41:47<5:24:00, 69.93s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3788 [2024-07-31 20:53:50,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.24 | bwd_microstep: 5194.01 | bwd_inner_microstep: 5148.98 | bwd_allreduce_microstep: 44.96 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3244 [2024-07-31 20:53:59,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.10 | bwd_microstep: 5203.12 | bwd_inner_microstep: 5000.65 | bwd_allreduce_microstep: 202.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3641 [2024-07-31 20:54:07,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3106.46 | bwd_microstep: 4960.74 | bwd_inner_microstep: 4894.97 | bwd_allreduce_microstep: 65.70 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2213 [2024-07-31 20:54:16,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.95 | bwd_microstep: 5197.66 | bwd_inner_microstep: 4794.58 | bwd_allreduce_microstep: 403.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 20:54:25,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.49 | bwd_microstep: 5202.64 | bwd_inner_microstep: 4798.92 | bwd_allreduce_microstep: 403.66 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 20:54:33,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.72 | bwd_microstep: 4905.15 | bwd_inner_microstep: 4881.59 | bwd_allreduce_microstep: 23.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-31 20:54:42,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.35 | bwd_microstep: 4920.29 | bwd_inner_microstep: 4895.43 | bwd_allreduce_microstep: 24.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 20:54:51,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 20:54:51,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.09 | bwd_microstep: 5025.50 | bwd_inner_microstep: 4966.66 | bwd_allreduce_microstep: 58.78 | step_microstep: 182.04 [2024-07-31 20:54:51,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28417.30 | bwd: 40609.08 | bwd_inner: 39381.71 | bwd_allreduce: 1226.90 | step: 182.62 77%|███████▋ | 953/1230 [18:42:57<5:22:02, 69.76s/it] {'loss': 1.173, 'learning_rate': 2.544518457774734e-06, 'epoch': 0.77} 77%|███████▋ | 953/1230 [18:42:57<5:22:02, 69.76s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 20:55:00,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.71 | bwd_microstep: 5375.63 | bwd_inner_microstep: 5280.42 | bwd_allreduce_microstep: 95.14 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2263 [2024-07-31 20:55:09,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.35 | bwd_microstep: 5283.31 | bwd_inner_microstep: 4872.44 | bwd_allreduce_microstep: 410.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-07-31 20:55:17,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.32 | bwd_microstep: 5263.23 | bwd_inner_microstep: 4853.22 | bwd_allreduce_microstep: 409.94 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3776 [2024-07-31 20:55:26,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.87 | bwd_microstep: 5150.02 | bwd_inner_microstep: 5096.65 | bwd_allreduce_microstep: 53.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-31 20:55:35,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.71 | bwd_microstep: 5159.59 | bwd_inner_microstep: 5080.97 | bwd_allreduce_microstep: 78.55 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 20:55:44,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.90 | bwd_microstep: 4939.54 | bwd_inner_microstep: 4912.36 | bwd_allreduce_microstep: 27.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-07-31 20:55:52,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.73 | bwd_microstep: 5056.44 | bwd_inner_microstep: 5014.46 | bwd_allreduce_microstep: 41.91 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-31 20:56:01,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 20:56:01,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.54 | bwd_microstep: 4924.04 | bwd_inner_microstep: 4899.46 | bwd_allreduce_microstep: 24.51 | step_microstep: 181.61 [2024-07-31 20:56:01,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29152.03 | bwd: 41151.80 | bwd_inner: 40009.93 | bwd_allreduce: 1141.37 | step: 182.31 78%|███████▊ | 954/1230 [18:44:07<5:22:06, 70.02s/it] {'loss': 1.1249, 'learning_rate': 2.5269942867373565e-06, 'epoch': 0.78} 78%|███████▊ | 954/1230 [18:44:07<5:22:06, 70.02s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 20:56:11,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3865.56 | bwd_microstep: 5343.43 | bwd_inner_microstep: 5324.35 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3845 [2024-07-31 20:56:19,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.74 | bwd_microstep: 5212.54 | bwd_inner_microstep: 5156.29 | bwd_allreduce_microstep: 56.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3769 [2024-07-31 20:56:28,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.59 | bwd_microstep: 5081.03 | bwd_inner_microstep: 5050.44 | bwd_allreduce_microstep: 30.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-31 20:56:36,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3157.22 | bwd_microstep: 5033.23 | bwd_inner_microstep: 4643.65 | bwd_allreduce_microstep: 389.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-31 20:56:45,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.74 | bwd_microstep: 5000.56 | bwd_inner_microstep: 4981.23 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-07-31 20:56:54,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.31 | bwd_microstep: 5254.88 | bwd_inner_microstep: 4848.39 | bwd_allreduce_microstep: 406.42 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2123 [2024-07-31 20:57:03,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.32 | bwd_microstep: 5194.16 | bwd_inner_microstep: 4788.29 | bwd_allreduce_microstep: 405.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 20:57:12,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 20:57:12,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.70 | bwd_microstep: 5319.07 | bwd_inner_microstep: 5243.17 | bwd_allreduce_microstep: 75.83 | step_microstep: 181.52 [2024-07-31 20:57:12,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28931.08 | bwd: 41438.88 | bwd_inner: 40035.75 | bwd_allreduce: 1402.63 | step: 182.11 78%|███████▊ | 955/1230 [18:45:18<5:21:51, 70.23s/it] {'loss': 1.1115, 'learning_rate': 2.5095219376703183e-06, 'epoch': 0.78} 78%|███████▊ | 955/1230 [18:45:18<5:21:51, 70.23s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 20:57:21,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.44 | bwd_microstep: 5219.64 | bwd_inner_microstep: 5200.54 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2071 [2024-07-31 20:57:30,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.96 | bwd_microstep: 5324.30 | bwd_inner_microstep: 4911.65 | bwd_allreduce_microstep: 412.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 20:57:39,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.47 | bwd_microstep: 5222.91 | bwd_inner_microstep: 5137.88 | bwd_allreduce_microstep: 84.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-07-31 20:57:47,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.74 | bwd_microstep: 5013.43 | bwd_inner_microstep: 4994.10 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-31 20:57:56,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.79 | bwd_microstep: 5123.13 | bwd_inner_microstep: 5048.61 | bwd_allreduce_microstep: 74.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 20:58:05,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.59 | bwd_microstep: 5096.25 | bwd_inner_microstep: 4700.75 | bwd_allreduce_microstep: 395.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 20:58:14,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.97 | bwd_microstep: 5193.38 | bwd_inner_microstep: 5117.34 | bwd_allreduce_microstep: 75.97 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2147 [2024-07-31 20:58:22,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 20:58:22,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.45 | bwd_microstep: 5104.41 | bwd_inner_microstep: 4709.41 | bwd_allreduce_microstep: 394.93 | step_microstep: 181.15 [2024-07-31 20:58:22,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28865.31 | bwd: 41297.43 | bwd_inner: 39820.23 | bwd_allreduce: 1476.72 | step: 181.73 78%|███████▊ | 956/1230 [18:46:28<5:21:03, 70.31s/it] {'loss': 1.1363, 'learning_rate': 2.4921015317365794e-06, 'epoch': 0.78} 78%|███████▊ | 956/1230 [18:46:28<5:21:03, 70.31s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3546 [2024-07-31 20:58:31,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.84 | bwd_microstep: 5180.97 | bwd_inner_microstep: 5091.40 | bwd_allreduce_microstep: 89.51 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3559 [2024-07-31 20:58:40,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.50 | bwd_microstep: 5186.82 | bwd_inner_microstep: 5078.56 | bwd_allreduce_microstep: 108.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3606 [2024-07-31 20:58:48,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.02 | bwd_microstep: 4663.20 | bwd_inner_microstep: 4637.43 | bwd_allreduce_microstep: 25.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-31 20:58:57,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.82 | bwd_microstep: 5178.61 | bwd_inner_microstep: 5122.27 | bwd_allreduce_microstep: 56.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 20:59:05,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.20 | bwd_microstep: 5005.85 | bwd_inner_microstep: 4953.94 | bwd_allreduce_microstep: 51.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-31 20:59:14,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.25 | bwd_microstep: 4998.09 | bwd_inner_microstep: 4978.64 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 20:59:23,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.92 | bwd_microstep: 5108.34 | bwd_inner_microstep: 5044.49 | bwd_allreduce_microstep: 63.79 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2937 [2024-07-31 20:59:32,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 20:59:32,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.21 | bwd_microstep: 4984.97 | bwd_inner_microstep: 4645.15 | bwd_allreduce_microstep: 339.75 | step_microstep: 183.31 [2024-07-31 20:59:32,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28388.65 | bwd: 40306.84 | bwd_inner: 39551.82 | bwd_allreduce: 754.53 | step: 183.90 78%|███████▊ | 957/1230 [18:47:37<5:18:09, 69.92s/it] {'loss': 1.1512, 'learning_rate': 2.474733189738908e-06, 'epoch': 0.78} 78%|███████▊ | 957/1230 [18:47:37<5:18:09, 69.92s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3895 [2024-07-31 20:59:41,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.45 | bwd_microstep: 5339.44 | bwd_inner_microstep: 5257.92 | bwd_allreduce_microstep: 81.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-07-31 20:59:49,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.40 | bwd_microstep: 5019.91 | bwd_inner_microstep: 5000.55 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-31 20:59:58,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.34 | bwd_microstep: 5178.67 | bwd_inner_microstep: 5103.44 | bwd_allreduce_microstep: 75.15 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-31 21:00:07,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.07 | bwd_microstep: 5082.50 | bwd_inner_microstep: 5017.31 | bwd_allreduce_microstep: 65.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 21:00:15,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.55 | bwd_microstep: 5135.25 | bwd_inner_microstep: 5066.87 | bwd_allreduce_microstep: 68.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-31 21:00:24,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.01 | bwd_microstep: 4999.79 | bwd_inner_microstep: 4949.32 | bwd_allreduce_microstep: 50.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 21:00:33,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.88 | bwd_microstep: 5012.74 | bwd_inner_microstep: 4959.17 | bwd_allreduce_microstep: 53.50 | step_microstep: 0.19 dynamic ViT batch size: 13, images per sample: 6.5, dynamic token length: 2907 [2024-07-31 21:00:42,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 21:00:42,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.36 | bwd_microstep: 5162.05 | bwd_inner_microstep: 4758.49 | bwd_allreduce_microstep: 403.49 | step_microstep: 181.44 [2024-07-31 21:00:42,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28763.97 | bwd: 40930.33 | bwd_inner: 40113.01 | bwd_allreduce: 816.82 | step: 182.13 78%|███████▊ | 958/1230 [18:48:47<5:17:07, 69.96s/it] {'loss': 1.1167, 'learning_rate': 2.4574170321190305e-06, 'epoch': 0.78} 78%|███████▊ | 958/1230 [18:48:47<5:17:07, 69.96s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 21:00:51,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3895.15 | bwd_microstep: 5426.94 | bwd_inner_microstep: 5407.90 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3850 [2024-07-31 21:01:00,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.29 | bwd_microstep: 5170.65 | bwd_inner_microstep: 5125.42 | bwd_allreduce_microstep: 45.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 21:01:09,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.56 | bwd_microstep: 5208.48 | bwd_inner_microstep: 5147.13 | bwd_allreduce_microstep: 61.27 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-31 21:01:17,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.13 | bwd_microstep: 5134.05 | bwd_inner_microstep: 5062.45 | bwd_allreduce_microstep: 71.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 21:01:26,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.13 | bwd_microstep: 5181.44 | bwd_inner_microstep: 5096.12 | bwd_allreduce_microstep: 85.26 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 21:01:35,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.09 | bwd_microstep: 5028.06 | bwd_inner_microstep: 4988.60 | bwd_allreduce_microstep: 39.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3668 [2024-07-31 21:01:43,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.48 | bwd_microstep: 5108.98 | bwd_inner_microstep: 5029.52 | bwd_allreduce_microstep: 79.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-31 21:01:52,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 21:01:52,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.54 | bwd_microstep: 4997.21 | bwd_inner_microstep: 4947.55 | bwd_allreduce_microstep: 49.59 | step_microstep: 181.28 [2024-07-31 21:01:52,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29087.28 | bwd: 41255.79 | bwd_inner: 40804.63 | bwd_allreduce: 450.68 | step: 181.87 78%|███████▊ | 959/1230 [18:49:58<5:16:56, 70.17s/it] {'loss': 1.1136, 'learning_rate': 2.440153178956798e-06, 'epoch': 0.78} 78%|███████▊ | 959/1230 [18:49:58<5:16:56, 70.17s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 21:02:02,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3872.72 | bwd_microstep: 5410.17 | bwd_inner_microstep: 5391.13 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2812 [2024-07-31 21:02:11,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.81 | bwd_microstep: 5487.38 | bwd_inner_microstep: 5063.69 | bwd_allreduce_microstep: 423.62 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 21:02:20,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.41 | bwd_microstep: 5219.32 | bwd_inner_microstep: 4814.01 | bwd_allreduce_microstep: 405.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 21:02:28,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.48 | bwd_microstep: 4991.21 | bwd_inner_microstep: 4934.55 | bwd_allreduce_microstep: 56.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-31 21:02:37,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.93 | bwd_microstep: 4973.97 | bwd_inner_microstep: 4941.16 | bwd_allreduce_microstep: 32.75 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3699 [2024-07-31 21:02:45,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3111.43 | bwd_microstep: 4797.57 | bwd_inner_microstep: 4764.55 | bwd_allreduce_microstep: 32.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-31 21:02:53,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.99 | bwd_microstep: 4908.12 | bwd_inner_microstep: 4886.21 | bwd_allreduce_microstep: 21.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 21:03:02,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 21:03:02,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3328.69 | bwd_microstep: 4861.37 | bwd_inner_microstep: 4830.12 | bwd_allreduce_microstep: 31.18 | step_microstep: 182.48 [2024-07-31 21:03:02,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28375.38 | bwd: 40649.10 | bwd_inner: 39625.36 | bwd_allreduce: 1023.26 | step: 183.07 78%|███████▊ | 960/1230 [18:51:07<5:14:40, 69.93s/it] {'loss': 1.1459, 'learning_rate': 2.42294174996935e-06, 'epoch': 0.78} 78%|███████▊ | 960/1230 [18:51:07<5:14:40, 69.93s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2369 [2024-07-31 21:03:10,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3079.44 | bwd_microstep: 5122.67 | bwd_inner_microstep: 4732.45 | bwd_allreduce_microstep: 390.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3881 [2024-07-31 21:03:19,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.35 | bwd_microstep: 5147.43 | bwd_inner_microstep: 5108.98 | bwd_allreduce_microstep: 38.39 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2049 [2024-07-31 21:03:27,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3031.35 | bwd_microstep: 5011.74 | bwd_inner_microstep: 4625.30 | bwd_allreduce_microstep: 386.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-07-31 21:03:35,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.73 | bwd_microstep: 5109.14 | bwd_inner_microstep: 5034.38 | bwd_allreduce_microstep: 74.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-31 21:03:44,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.25 | bwd_microstep: 4938.19 | bwd_inner_microstep: 4913.04 | bwd_allreduce_microstep: 25.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2140 [2024-07-31 21:03:53,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.67 | bwd_microstep: 5097.34 | bwd_inner_microstep: 4702.07 | bwd_allreduce_microstep: 395.20 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-31 21:04:01,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.14 | bwd_microstep: 5020.53 | bwd_inner_microstep: 4979.43 | bwd_allreduce_microstep: 41.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 21:04:10,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 21:04:10,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.26 | bwd_microstep: 5059.02 | bwd_inner_microstep: 4667.46 | bwd_allreduce_microstep: 391.49 | step_microstep: 184.02 [2024-07-31 21:04:10,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27760.12 | bwd: 40506.03 | bwd_inner: 38763.05 | bwd_allreduce: 1742.49 | step: 184.61 78%|███████▊ | 961/1230 [18:52:16<5:11:42, 69.53s/it] {'loss': 1.1848, 'learning_rate': 2.40578286451029e-06, 'epoch': 0.78} 78%|███████▊ | 961/1230 [18:52:16<5:11:42, 69.53s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3542 [2024-07-31 21:04:19,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.26 | bwd_microstep: 5402.67 | bwd_inner_microstep: 5243.12 | bwd_allreduce_microstep: 159.47 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 21:04:28,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3385.74 | bwd_microstep: 5108.40 | bwd_inner_microstep: 5043.40 | bwd_allreduce_microstep: 64.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3793 [2024-07-31 21:04:37,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.28 | bwd_microstep: 5034.67 | bwd_inner_microstep: 5015.31 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-31 21:04:45,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.92 | bwd_microstep: 5004.53 | bwd_inner_microstep: 4985.12 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 21:04:54,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.52 | bwd_microstep: 5219.04 | bwd_inner_microstep: 5136.42 | bwd_allreduce_microstep: 82.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-31 21:05:03,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.91 | bwd_microstep: 4905.71 | bwd_inner_microstep: 4883.70 | bwd_allreduce_microstep: 21.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 21:05:11,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3480.24 | bwd_microstep: 5060.92 | bwd_inner_microstep: 4667.74 | bwd_allreduce_microstep: 393.11 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3703 [2024-07-31 21:05:20,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-31 21:05:20,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.46 | bwd_microstep: 4949.03 | bwd_inner_microstep: 4889.07 | bwd_allreduce_microstep: 59.89 | step_microstep: 181.81 [2024-07-31 21:05:20,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28885.24 | bwd: 40684.96 | bwd_inner: 39863.82 | bwd_allreduce: 820.64 | step: 182.51 78%|███████▊ | 962/1230 [18:53:26<5:11:03, 69.64s/it] {'loss': 1.1791, 'learning_rate': 2.3886766415688567e-06, 'epoch': 0.78} 78%|███████▊ | 962/1230 [18:53:26<5:11:03, 69.64s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 21:05:29,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3881.34 | bwd_microstep: 5373.01 | bwd_inner_microstep: 5353.93 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 21:05:38,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3385.29 | bwd_microstep: 5218.29 | bwd_inner_microstep: 5143.82 | bwd_allreduce_microstep: 74.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3823 [2024-07-31 21:05:47,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.43 | bwd_microstep: 4979.27 | bwd_inner_microstep: 4949.75 | bwd_allreduce_microstep: 29.45 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2232 [2024-07-31 21:05:55,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.23 | bwd_microstep: 5263.16 | bwd_inner_microstep: 4851.20 | bwd_allreduce_microstep: 411.89 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3786 [2024-07-31 21:06:04,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.80 | bwd_microstep: 5204.16 | bwd_inner_microstep: 5133.80 | bwd_allreduce_microstep: 70.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-31 21:06:13,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.09 | bwd_microstep: 5019.55 | bwd_inner_microstep: 5000.19 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2101 [2024-07-31 21:06:22,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.53 | bwd_microstep: 5123.50 | bwd_inner_microstep: 4727.85 | bwd_allreduce_microstep: 395.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 21:06:31,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 21:06:31,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.98 | bwd_microstep: 5048.52 | bwd_inner_microstep: 4988.81 | bwd_allreduce_microstep: 59.64 | step_microstep: 181.28 [2024-07-31 21:06:31,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28893.59 | bwd: 41229.41 | bwd_inner: 40149.29 | bwd_allreduce: 1079.61 | step: 181.87 78%|███████▊ | 963/1230 [18:54:36<5:10:59, 69.88s/it] {'loss': 1.1055, 'learning_rate': 2.3716231997691007e-06, 'epoch': 0.78} 78%|███████▊ | 963/1230 [18:54:36<5:10:59, 69.88s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2271 [2024-07-31 21:06:39,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.39 | bwd_microstep: 5339.82 | bwd_inner_microstep: 4930.30 | bwd_allreduce_microstep: 409.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3911 [2024-07-31 21:06:49,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3837.05 | bwd_microstep: 5337.77 | bwd_inner_microstep: 5293.18 | bwd_allreduce_microstep: 44.52 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2259 [2024-07-31 21:06:57,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3296.71 | bwd_microstep: 5113.72 | bwd_inner_microstep: 4719.16 | bwd_allreduce_microstep: 394.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-07-31 21:07:06,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.23 | bwd_microstep: 5000.48 | bwd_inner_microstep: 4981.12 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 21:07:15,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.19 | bwd_microstep: 5135.17 | bwd_inner_microstep: 5060.03 | bwd_allreduce_microstep: 75.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 21:07:23,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.83 | bwd_microstep: 5003.97 | bwd_inner_microstep: 4984.66 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-31 21:07:32,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.90 | bwd_microstep: 5040.83 | bwd_inner_microstep: 4986.11 | bwd_allreduce_microstep: 54.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 21:07:41,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 21:07:41,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.40 | bwd_microstep: 4940.21 | bwd_inner_microstep: 4912.14 | bwd_allreduce_microstep: 28.00 | step_microstep: 181.30 [2024-07-31 21:07:41,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29089.61 | bwd: 40911.94 | bwd_inner: 39866.66 | bwd_allreduce: 1044.80 | step: 181.87 78%|███████▊ | 964/1230 [18:55:47<5:10:24, 70.02s/it] {'loss': 1.1378, 'learning_rate': 2.354622657369048e-06, 'epoch': 0.78} 78%|███████▊ | 964/1230 [18:55:47<5:10:24, 70.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3945 [2024-07-31 21:07:50,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.01 | bwd_microstep: 5543.34 | bwd_inner_microstep: 5454.70 | bwd_allreduce_microstep: 88.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2271 [2024-07-31 21:07:59,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.17 | bwd_microstep: 5201.10 | bwd_inner_microstep: 4799.51 | bwd_allreduce_microstep: 401.52 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2049 [2024-07-31 21:08:08,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.42 | bwd_microstep: 5246.35 | bwd_inner_microstep: 4838.63 | bwd_allreduce_microstep: 407.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-07-31 21:08:16,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.11 | bwd_microstep: 5013.89 | bwd_inner_microstep: 4994.54 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-31 21:08:25,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.75 | bwd_microstep: 5028.05 | bwd_inner_microstep: 5008.70 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3645 [2024-07-31 21:08:34,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.31 | bwd_microstep: 5173.85 | bwd_inner_microstep: 5071.07 | bwd_allreduce_microstep: 102.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-31 21:08:42,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3009.72 | bwd_microstep: 4863.55 | bwd_inner_microstep: 4489.21 | bwd_allreduce_microstep: 374.27 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2671 [2024-07-31 21:08:51,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 21:08:51,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.71 | bwd_microstep: 5094.29 | bwd_inner_microstep: 4698.91 | bwd_allreduce_microstep: 395.28 | step_microstep: 186.25 [2024-07-31 21:08:51,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28432.09 | bwd: 41164.40 | bwd_inner: 39355.21 | bwd_allreduce: 1808.68 | step: 186.83 78%|███████▊ | 965/1230 [18:56:57<5:09:07, 69.99s/it] {'loss': 1.1409, 'learning_rate': 2.3376751322599e-06, 'epoch': 0.78} 78%|███████▊ | 965/1230 [18:56:57<5:09:07, 69.99s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 21:09:00,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3884.80 | bwd_microstep: 5459.66 | bwd_inner_microstep: 5431.65 | bwd_allreduce_microstep: 27.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2305 [2024-07-31 21:09:09,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.02 | bwd_microstep: 5331.15 | bwd_inner_microstep: 4915.15 | bwd_allreduce_microstep: 415.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 21:09:18,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.91 | bwd_microstep: 5107.36 | bwd_inner_microstep: 5072.83 | bwd_allreduce_microstep: 34.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3836 [2024-07-31 21:09:27,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.75 | bwd_microstep: 5205.41 | bwd_inner_microstep: 5150.79 | bwd_allreduce_microstep: 54.55 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3762 [2024-07-31 21:09:36,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.16 | bwd_microstep: 5094.94 | bwd_inner_microstep: 5036.70 | bwd_allreduce_microstep: 58.17 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2185 [2024-07-31 21:09:44,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.43 | bwd_microstep: 5061.79 | bwd_inner_microstep: 4668.19 | bwd_allreduce_microstep: 393.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-31 21:09:52,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3342.06 | bwd_microstep: 5024.44 | bwd_inner_microstep: 4639.25 | bwd_allreduce_microstep: 385.11 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2122 [2024-07-31 21:10:01,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 21:10:01,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.92 | bwd_microstep: 5117.01 | bwd_inner_microstep: 4720.50 | bwd_allreduce_microstep: 396.44 | step_microstep: 181.78 [2024-07-31 21:10:01,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28781.97 | bwd: 41401.72 | bwd_inner: 39635.00 | bwd_allreduce: 1766.24 | step: 182.36 79%|███████▊ | 966/1230 [18:58:07<5:08:39, 70.15s/it] {'loss': 1.0695, 'learning_rate': 2.320780741965206e-06, 'epoch': 0.79} 79%|███████▊ | 966/1230 [18:58:07<5:08:39, 70.15s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3967 [2024-07-31 21:10:11,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.61 | bwd_microstep: 5568.69 | bwd_inner_microstep: 5458.10 | bwd_allreduce_microstep: 110.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3562 [2024-07-31 21:10:19,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.05 | bwd_microstep: 5142.34 | bwd_inner_microstep: 5057.48 | bwd_allreduce_microstep: 84.80 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3582 [2024-07-31 21:10:28,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.30 | bwd_microstep: 5109.62 | bwd_inner_microstep: 5032.29 | bwd_allreduce_microstep: 77.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-31 21:10:37,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.18 | bwd_microstep: 5195.78 | bwd_inner_microstep: 5115.77 | bwd_allreduce_microstep: 79.95 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3729 [2024-07-31 21:10:46,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.92 | bwd_microstep: 5152.36 | bwd_inner_microstep: 5079.99 | bwd_allreduce_microstep: 72.29 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 21:10:54,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.71 | bwd_microstep: 4996.73 | bwd_inner_microstep: 4944.30 | bwd_allreduce_microstep: 52.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-31 21:11:03,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.30 | bwd_microstep: 5110.71 | bwd_inner_microstep: 4713.90 | bwd_allreduce_microstep: 396.74 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 21:11:12,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 21:11:12,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.30 | bwd_microstep: 5105.07 | bwd_inner_microstep: 5035.98 | bwd_allreduce_microstep: 69.02 | step_microstep: 181.58 [2024-07-31 21:11:12,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28803.29 | bwd: 41381.28 | bwd_inner: 40437.72 | bwd_allreduce: 943.05 | step: 182.27 79%|███████▊ | 967/1230 [18:59:18<5:07:57, 70.26s/it] {'loss': 1.1121, 'learning_rate': 2.3039396036400484e-06, 'epoch': 0.79} 79%|███████▊ | 967/1230 [18:59:18<5:07:57, 70.26s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4005 [2024-07-31 21:11:21,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3860.69 | bwd_microstep: 5267.07 | bwd_inner_microstep: 5248.01 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3778 [2024-07-31 21:11:30,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.65 | bwd_microstep: 5232.41 | bwd_inner_microstep: 5177.75 | bwd_allreduce_microstep: 54.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3790 [2024-07-31 21:11:39,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3827.06 | bwd_microstep: 5336.50 | bwd_inner_microstep: 5277.63 | bwd_allreduce_microstep: 58.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3770 [2024-07-31 21:11:48,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.64 | bwd_microstep: 5041.07 | bwd_inner_microstep: 5016.12 | bwd_allreduce_microstep: 24.89 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 21:11:57,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.24 | bwd_microstep: 5113.86 | bwd_inner_microstep: 5045.74 | bwd_allreduce_microstep: 68.06 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2173 [2024-07-31 21:12:05,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.91 | bwd_microstep: 5036.13 | bwd_inner_microstep: 4645.20 | bwd_allreduce_microstep: 390.87 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-07-31 21:12:14,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.77 | bwd_microstep: 5014.96 | bwd_inner_microstep: 4993.58 | bwd_allreduce_microstep: 21.31 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2156 [2024-07-31 21:12:22,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 21:12:22,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2996.30 | bwd_microstep: 4883.06 | bwd_inner_microstep: 4508.45 | bwd_allreduce_microstep: 374.54 | step_microstep: 183.05 [2024-07-31 21:12:22,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28880.15 | bwd: 40925.04 | bwd_inner: 39912.42 | bwd_allreduce: 1012.15 | step: 183.63 79%|███████▊ | 968/1230 [19:00:28<5:06:38, 70.22s/it] {'loss': 1.1278, 'learning_rate': 2.2871518340702236e-06, 'epoch': 0.79} 79%|███████▊ | 968/1230 [19:00:28<5:06:38, 70.22s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3943 [2024-07-31 21:12:31,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.97 | bwd_microstep: 5211.51 | bwd_inner_microstep: 5178.65 | bwd_allreduce_microstep: 32.78 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3816 [2024-07-31 21:12:40,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.14 | bwd_microstep: 5034.62 | bwd_inner_microstep: 5015.40 | bwd_allreduce_microstep: 19.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-31 21:12:49,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.23 | bwd_microstep: 5053.20 | bwd_inner_microstep: 5030.08 | bwd_allreduce_microstep: 23.06 | step_microstep: 0.08 dynamic ViT batch size: 13, images per sample: 6.5, dynamic token length: 3084 [2024-07-31 21:12:57,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.15 | bwd_microstep: 5233.17 | bwd_inner_microstep: 4856.58 | bwd_allreduce_microstep: 376.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3613 [2024-07-31 21:13:06,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.51 | bwd_microstep: 5116.58 | bwd_inner_microstep: 5029.37 | bwd_allreduce_microstep: 87.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-31 21:13:15,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.38 | bwd_microstep: 4985.15 | bwd_inner_microstep: 4965.74 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-31 21:13:24,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.37 | bwd_microstep: 4932.30 | bwd_inner_microstep: 4907.41 | bwd_allreduce_microstep: 24.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-31 21:13:32,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.85 [2024-07-31 21:13:32,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.12 | bwd_microstep: 4883.14 | bwd_inner_microstep: 4863.81 | bwd_allreduce_microstep: 19.27 | step_microstep: 182.12 [2024-07-31 21:13:32,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29568.77 | bwd: 40449.65 | bwd_inner: 39846.97 | bwd_allreduce: 602.20 | step: 182.70 79%|███████▉ | 969/1230 [19:01:38<5:05:38, 70.26s/it] {'loss': 1.1539, 'learning_rate': 2.2704175496714552e-06, 'epoch': 0.79} 79%|███████▉ | 969/1230 [19:01:38<5:05:38, 70.26s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3791 [2024-07-31 21:13:41,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.11 | bwd_microstep: 5250.50 | bwd_inner_microstep: 5208.54 | bwd_allreduce_microstep: 41.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2231 [2024-07-31 21:13:50,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.78 | bwd_microstep: 5264.78 | bwd_inner_microstep: 4854.58 | bwd_allreduce_microstep: 410.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-31 21:13:59,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.28 | bwd_microstep: 5007.04 | bwd_inner_microstep: 4970.90 | bwd_allreduce_microstep: 36.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 21:14:08,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.51 | bwd_microstep: 5172.30 | bwd_inner_microstep: 5096.50 | bwd_allreduce_microstep: 75.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 21:14:16,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.80 | bwd_microstep: 5032.76 | bwd_inner_microstep: 4977.13 | bwd_allreduce_microstep: 55.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-07-31 21:14:25,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.07 | bwd_microstep: 5145.17 | bwd_inner_microstep: 5098.10 | bwd_allreduce_microstep: 47.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 21:14:34,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.34 | bwd_microstep: 5190.80 | bwd_inner_microstep: 5108.18 | bwd_allreduce_microstep: 82.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 21:14:43,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.78 [2024-07-31 21:14:43,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.48 | bwd_microstep: 4928.94 | bwd_inner_microstep: 4902.37 | bwd_allreduce_microstep: 26.51 | step_microstep: 181.78 [2024-07-31 21:14:43,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29132.26 | bwd: 40992.27 | bwd_inner: 40216.23 | bwd_allreduce: 775.57 | step: 182.37 79%|███████▉ | 970/1230 [19:02:49<5:04:43, 70.32s/it] {'loss': 1.2019, 'learning_rate': 2.2537368664885527e-06, 'epoch': 0.79} 79%|███████▉ | 970/1230 [19:02:49<5:04:43, 70.32s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4046 [2024-07-31 21:14:52,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.45 | bwd_microstep: 5412.03 | bwd_inner_microstep: 5372.93 | bwd_allreduce_microstep: 39.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-31 21:15:01,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.61 | bwd_microstep: 5186.46 | bwd_inner_microstep: 5103.83 | bwd_allreduce_microstep: 82.56 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2023 [2024-07-31 21:15:09,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.19 | bwd_microstep: 5194.82 | bwd_inner_microstep: 4789.51 | bwd_allreduce_microstep: 405.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-31 21:15:18,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.13 | bwd_microstep: 5242.99 | bwd_inner_microstep: 4836.24 | bwd_allreduce_microstep: 406.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3784 [2024-07-31 21:15:27,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.53 | bwd_microstep: 5026.38 | bwd_inner_microstep: 5004.50 | bwd_allreduce_microstep: 21.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 21:15:36,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3478.54 | bwd_microstep: 5066.23 | bwd_inner_microstep: 4671.89 | bwd_allreduce_microstep: 394.27 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-31 21:15:44,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.32 | bwd_microstep: 5006.78 | bwd_inner_microstep: 4949.86 | bwd_allreduce_microstep: 56.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 21:15:53,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 21:15:53,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.11 | bwd_microstep: 5060.85 | bwd_inner_microstep: 4999.07 | bwd_allreduce_microstep: 61.71 | step_microstep: 182.46 [2024-07-31 21:15:53,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28806.78 | bwd: 41196.53 | bwd_inner: 39727.77 | bwd_allreduce: 1468.27 | step: 183.05 79%|███████▉ | 971/1230 [19:03:59<5:03:34, 70.32s/it] {'loss': 1.1149, 'learning_rate': 2.237109900194642e-06, 'epoch': 0.79} 79%|███████▉ | 971/1230 [19:03:59<5:03:34, 70.32s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 21:16:02,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3870.66 | bwd_microstep: 5406.31 | bwd_inner_microstep: 5387.19 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2350 [2024-07-31 21:16:11,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.86 | bwd_microstep: 5254.00 | bwd_inner_microstep: 4845.87 | bwd_allreduce_microstep: 408.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2245 [2024-07-31 21:16:20,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.10 | bwd_microstep: 5141.15 | bwd_inner_microstep: 4742.54 | bwd_allreduce_microstep: 398.54 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 21:16:29,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.05 | bwd_microstep: 5243.52 | bwd_inner_microstep: 5151.64 | bwd_allreduce_microstep: 91.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 21:16:38,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.88 | bwd_microstep: 5032.45 | bwd_inner_microstep: 5006.10 | bwd_allreduce_microstep: 26.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 21:16:46,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.52 | bwd_microstep: 5167.34 | bwd_inner_microstep: 5089.20 | bwd_allreduce_microstep: 78.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 21:16:55,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.22 | bwd_microstep: 5137.83 | bwd_inner_microstep: 4740.52 | bwd_allreduce_microstep: 397.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3643 [2024-07-31 21:17:04,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 21:17:04,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.88 | bwd_microstep: 5065.46 | bwd_inner_microstep: 4982.50 | bwd_allreduce_microstep: 82.89 | step_microstep: 182.31 [2024-07-31 21:17:04,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29066.08 | bwd: 41448.03 | bwd_inner: 39945.49 | bwd_allreduce: 1502.05 | step: 182.89 79%|███████▉ | 972/1230 [19:05:10<5:03:04, 70.48s/it] {'loss': 1.1227, 'learning_rate': 2.2205367660903267e-06, 'epoch': 0.79} 79%|███████▉ | 972/1230 [19:05:10<5:03:04, 70.48s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3861 [2024-07-31 21:17:13,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.75 | bwd_microstep: 5345.85 | bwd_inner_microstep: 5284.69 | bwd_allreduce_microstep: 61.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2283 [2024-07-31 21:17:22,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.77 | bwd_microstep: 5198.83 | bwd_inner_microstep: 4796.50 | bwd_allreduce_microstep: 402.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-07-31 21:17:31,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.55 | bwd_microstep: 5083.15 | bwd_inner_microstep: 5051.70 | bwd_allreduce_microstep: 31.38 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3615 [2024-07-31 21:17:39,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.44 | bwd_microstep: 5057.96 | bwd_inner_microstep: 5008.24 | bwd_allreduce_microstep: 49.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 21:17:48,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.80 | bwd_microstep: 5203.06 | bwd_inner_microstep: 5141.64 | bwd_allreduce_microstep: 61.36 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-31 21:17:57,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.87 | bwd_microstep: 5122.28 | bwd_inner_microstep: 5047.70 | bwd_allreduce_microstep: 74.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 21:18:06,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.48 | bwd_microstep: 4909.46 | bwd_inner_microstep: 4886.64 | bwd_allreduce_microstep: 22.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 21:18:14,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 21:18:14,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.68 | bwd_microstep: 5052.82 | bwd_inner_microstep: 4988.39 | bwd_allreduce_microstep: 64.36 | step_microstep: 182.93 [2024-07-31 21:18:14,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29159.25 | bwd: 40973.41 | bwd_inner: 40205.44 | bwd_allreduce: 767.49 | step: 183.53 79%|███████▉ | 973/1230 [19:06:20<5:01:52, 70.48s/it] {'loss': 1.1557, 'learning_rate': 2.2040175791029305e-06, 'epoch': 0.79} 79%|███████▉ | 973/1230 [19:06:20<5:01:52, 70.48s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3915 [2024-07-31 21:18:24,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.42 | bwd_microstep: 5538.04 | bwd_inner_microstep: 5447.44 | bwd_allreduce_microstep: 90.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3828 [2024-07-31 21:18:33,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.07 | bwd_microstep: 5152.46 | bwd_inner_microstep: 5121.83 | bwd_allreduce_microstep: 30.56 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2027 [2024-07-31 21:18:41,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.07 | bwd_microstep: 5223.43 | bwd_inner_microstep: 4817.28 | bwd_allreduce_microstep: 406.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 21:18:50,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.54 | bwd_microstep: 4998.61 | bwd_inner_microstep: 4960.67 | bwd_allreduce_microstep: 37.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-31 21:18:59,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.21 | bwd_microstep: 5084.07 | bwd_inner_microstep: 5039.51 | bwd_allreduce_microstep: 44.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2140 [2024-07-31 21:19:08,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.82 | bwd_microstep: 5124.87 | bwd_inner_microstep: 4727.96 | bwd_allreduce_microstep: 396.84 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2623 [2024-07-31 21:19:16,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.35 | bwd_microstep: 5232.08 | bwd_inner_microstep: 4825.75 | bwd_allreduce_microstep: 406.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 21:19:25,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 21:19:25,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.71 | bwd_microstep: 5029.67 | bwd_inner_microstep: 4972.40 | bwd_allreduce_microstep: 57.20 | step_microstep: 181.53 [2024-07-31 21:19:25,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29058.09 | bwd: 41383.21 | bwd_inner: 39912.77 | bwd_allreduce: 1469.96 | step: 182.11 79%|███████▉ | 974/1230 [19:07:31<5:01:04, 70.57s/it] {'loss': 1.1264, 'learning_rate': 2.187552453785662e-06, 'epoch': 0.79} 79%|███████▉ | 974/1230 [19:07:31<5:01:04, 70.57s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3884 [2024-07-31 21:19:34,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3835.91 | bwd_microstep: 5295.61 | bwd_inner_microstep: 5253.93 | bwd_allreduce_microstep: 41.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 21:19:42,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.00 | bwd_microstep: 4789.60 | bwd_inner_microstep: 4770.21 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3825 [2024-07-31 21:19:51,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.06 | bwd_microstep: 5037.93 | bwd_inner_microstep: 5018.65 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 21:20:00,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.77 | bwd_microstep: 4999.15 | bwd_inner_microstep: 4979.36 | bwd_allreduce_microstep: 19.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-31 21:20:09,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.82 | bwd_microstep: 5170.22 | bwd_inner_microstep: 4769.42 | bwd_allreduce_microstep: 400.73 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2199 [2024-07-31 21:20:17,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.01 | bwd_microstep: 5152.64 | bwd_inner_microstep: 4753.83 | bwd_allreduce_microstep: 398.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-07-31 21:20:26,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.01 | bwd_microstep: 4988.19 | bwd_inner_microstep: 4968.85 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 21:20:35,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 21:20:35,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.68 | bwd_microstep: 5158.98 | bwd_inner_microstep: 5087.49 | bwd_allreduce_microstep: 71.43 | step_microstep: 181.74 [2024-07-31 21:20:35,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28966.16 | bwd: 40592.31 | bwd_inner: 39601.69 | bwd_allreduce: 990.14 | step: 182.31 79%|███████▉ | 975/1230 [19:08:41<4:59:02, 70.36s/it] {'loss': 1.1483, 'learning_rate': 2.1711415043168425e-06, 'epoch': 0.79} 79%|███████▉ | 975/1230 [19:08:41<4:59:02, 70.36s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3569 [2024-07-31 21:20:44,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.85 | bwd_microstep: 5259.25 | bwd_inner_microstep: 5153.13 | bwd_allreduce_microstep: 106.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-31 21:20:53,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.43 | bwd_microstep: 5214.69 | bwd_inner_microstep: 5129.80 | bwd_allreduce_microstep: 84.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2238 [2024-07-31 21:21:02,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.29 | bwd_microstep: 5199.15 | bwd_inner_microstep: 4795.54 | bwd_allreduce_microstep: 403.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 21:21:10,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.24 | bwd_microstep: 5038.97 | bwd_inner_microstep: 5017.75 | bwd_allreduce_microstep: 21.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 21:21:19,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.92 | bwd_microstep: 4934.47 | bwd_inner_microstep: 4908.93 | bwd_allreduce_microstep: 25.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 21:21:28,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.42 | bwd_microstep: 5258.39 | bwd_inner_microstep: 5131.47 | bwd_allreduce_microstep: 126.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-31 21:21:37,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.82 | bwd_microstep: 4894.83 | bwd_inner_microstep: 4875.55 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 21:21:46,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 21:21:46,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.95 | bwd_microstep: 5173.79 | bwd_inner_microstep: 5097.48 | bwd_allreduce_microstep: 76.24 | step_microstep: 181.90 [2024-07-31 21:21:46,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29351.81 | bwd: 40973.52 | bwd_inner: 40109.59 | bwd_allreduce: 863.44 | step: 182.49 79%|███████▉ | 976/1230 [19:09:52<4:58:15, 70.45s/it] {'loss': 1.1494, 'learning_rate': 2.1547848444991004e-06, 'epoch': 0.79} 79%|███████▉ | 976/1230 [19:09:52<4:58:15, 70.45s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4046 [2024-07-31 21:21:54,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3334.88 | bwd_microstep: 5154.35 | bwd_inner_microstep: 5135.29 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3808 [2024-07-31 21:22:03,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.60 | bwd_microstep: 5036.10 | bwd_inner_microstep: 5016.77 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3055 [2024-07-31 21:22:12,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.27 | bwd_microstep: 5124.42 | bwd_inner_microstep: 4834.88 | bwd_allreduce_microstep: 289.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-31 21:22:20,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.75 | bwd_microstep: 5090.32 | bwd_inner_microstep: 5045.44 | bwd_allreduce_microstep: 44.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-31 21:22:29,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.50 | bwd_microstep: 5171.62 | bwd_inner_microstep: 4771.08 | bwd_allreduce_microstep: 400.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-31 21:22:38,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.07 | bwd_microstep: 5170.46 | bwd_inner_microstep: 5117.54 | bwd_allreduce_microstep: 52.86 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 21:22:47,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.74 | bwd_microstep: 5088.14 | bwd_inner_microstep: 4694.19 | bwd_allreduce_microstep: 393.89 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2140 [2024-07-31 21:22:55,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 21:22:55,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.46 | bwd_microstep: 5122.94 | bwd_inner_microstep: 4725.51 | bwd_allreduce_microstep: 397.37 | step_microstep: 181.32 [2024-07-31 21:22:55,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28400.17 | bwd: 40958.34 | bwd_inner: 39340.64 | bwd_allreduce: 1617.19 | step: 181.90 79%|███████▉ | 977/1230 [19:11:01<4:56:06, 70.22s/it] {'loss': 1.1849, 'learning_rate': 2.138482587758605e-06, 'epoch': 0.79} 79%|███████▉ | 977/1230 [19:11:01<4:56:06, 70.22s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3933 [2024-07-31 21:23:04,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.44 | bwd_microstep: 5244.67 | bwd_inner_microstep: 5198.52 | bwd_allreduce_microstep: 46.08 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3845 [2024-07-31 21:23:14,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.63 | bwd_microstep: 5483.33 | bwd_inner_microstep: 5393.40 | bwd_allreduce_microstep: 89.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3831 [2024-07-31 21:23:22,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.31 | bwd_microstep: 5094.04 | bwd_inner_microstep: 5054.44 | bwd_allreduce_microstep: 39.54 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3739 [2024-07-31 21:23:31,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.68 | bwd_microstep: 5053.85 | bwd_inner_microstep: 4997.01 | bwd_allreduce_microstep: 56.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 21:23:39,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.13 | bwd_microstep: 5029.53 | bwd_inner_microstep: 4975.30 | bwd_allreduce_microstep: 54.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-31 21:23:48,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.96 | bwd_microstep: 5110.07 | bwd_inner_microstep: 5040.05 | bwd_allreduce_microstep: 69.96 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3664 [2024-07-31 21:23:57,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.88 | bwd_microstep: 5056.71 | bwd_inner_microstep: 4978.60 | bwd_allreduce_microstep: 78.04 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3690 [2024-07-31 21:24:06,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 21:24:06,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.54 | bwd_microstep: 5019.95 | bwd_inner_microstep: 4950.53 | bwd_allreduce_microstep: 69.35 | step_microstep: 182.63 [2024-07-31 21:24:06,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28782.47 | bwd: 41092.14 | bwd_inner: 40587.80 | bwd_allreduce: 503.86 | step: 183.32 80%|███████▉ | 978/1230 [19:12:12<4:54:55, 70.22s/it] {'loss': 1.1274, 'learning_rate': 2.1222348471442477e-06, 'epoch': 0.8} 80%|███████▉ | 978/1230 [19:12:12<4:54:55, 70.22s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2422 [2024-07-31 21:24:15,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.89 | bwd_microstep: 5628.06 | bwd_inner_microstep: 5196.16 | bwd_allreduce_microstep: 431.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3841 [2024-07-31 21:24:24,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3785.88 | bwd_microstep: 5093.03 | bwd_inner_microstep: 5073.77 | bwd_allreduce_microstep: 19.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-31 21:24:33,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3802.90 | bwd_microstep: 5161.91 | bwd_inner_microstep: 5121.10 | bwd_allreduce_microstep: 40.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 21:24:41,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3025.40 | bwd_microstep: 4958.17 | bwd_inner_microstep: 4575.83 | bwd_allreduce_microstep: 382.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 21:24:50,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.82 | bwd_microstep: 4995.89 | bwd_inner_microstep: 4974.69 | bwd_allreduce_microstep: 21.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3673 [2024-07-31 21:24:58,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.83 | bwd_microstep: 4936.72 | bwd_inner_microstep: 4908.17 | bwd_allreduce_microstep: 28.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 21:25:07,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.68 | bwd_microstep: 4987.50 | bwd_inner_microstep: 4968.18 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 21:25:15,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.89 [2024-07-31 21:25:15,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3191.58 | bwd_microstep: 4713.77 | bwd_inner_microstep: 4689.25 | bwd_allreduce_microstep: 24.43 | step_microstep: 182.32 [2024-07-31 21:25:15,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28708.88 | bwd: 40475.03 | bwd_inner: 39507.10 | bwd_allreduce: 967.44 | step: 182.90 80%|███████▉ | 979/1230 [19:13:21<4:52:52, 70.01s/it] {'loss': 1.1215, 'learning_rate': 2.1060417353268845e-06, 'epoch': 0.8} 80%|███████▉ | 979/1230 [19:13:21<4:52:52, 70.01s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 21:25:24,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.99 | bwd_microstep: 5511.20 | bwd_inner_microstep: 5463.46 | bwd_allreduce_microstep: 47.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-31 21:25:33,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.90 | bwd_microstep: 5269.68 | bwd_inner_microstep: 5202.08 | bwd_allreduce_microstep: 67.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-31 21:25:42,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.20 | bwd_microstep: 5223.31 | bwd_inner_microstep: 5162.12 | bwd_allreduce_microstep: 61.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 21:25:51,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.03 | bwd_microstep: 5173.25 | bwd_inner_microstep: 5090.53 | bwd_allreduce_microstep: 82.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-31 21:25:59,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.49 | bwd_microstep: 4836.78 | bwd_inner_microstep: 4790.63 | bwd_allreduce_microstep: 46.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-31 21:26:08,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.10 | bwd_microstep: 5077.34 | bwd_inner_microstep: 4682.88 | bwd_allreduce_microstep: 394.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 21:26:16,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.74 | bwd_microstep: 4923.99 | bwd_inner_microstep: 4903.44 | bwd_allreduce_microstep: 20.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 21:26:25,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 21:26:25,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.19 | bwd_microstep: 4990.03 | bwd_inner_microstep: 4941.68 | bwd_allreduce_microstep: 48.28 | step_microstep: 181.54 [2024-07-31 21:26:25,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28558.55 | bwd: 41005.56 | bwd_inner: 40236.78 | bwd_allreduce: 768.30 | step: 182.12 80%|███████▉ | 980/1230 [19:14:31<4:51:34, 69.98s/it] {'loss': 1.1129, 'learning_rate': 2.0899033645985423e-06, 'epoch': 0.8} 80%|███████▉ | 980/1230 [19:14:31<4:51:34, 69.98s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4070 [2024-07-31 21:26:34,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3846.72 | bwd_microstep: 5400.21 | bwd_inner_microstep: 5381.15 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-31 21:26:43,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.95 | bwd_microstep: 5173.94 | bwd_inner_microstep: 4772.61 | bwd_allreduce_microstep: 401.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3828 [2024-07-31 21:26:51,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3253.62 | bwd_microstep: 4866.94 | bwd_inner_microstep: 4847.38 | bwd_allreduce_microstep: 19.50 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2841 [2024-07-31 21:27:00,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.40 | bwd_microstep: 5152.42 | bwd_inner_microstep: 4751.38 | bwd_allreduce_microstep: 400.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 21:27:09,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.58 | bwd_microstep: 4990.14 | bwd_inner_microstep: 4952.70 | bwd_allreduce_microstep: 37.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 21:27:17,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3242.25 | bwd_microstep: 4852.55 | bwd_inner_microstep: 4825.18 | bwd_allreduce_microstep: 27.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-31 21:27:26,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.18 | bwd_microstep: 5195.38 | bwd_inner_microstep: 5119.83 | bwd_allreduce_microstep: 75.47 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 21:27:34,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 21:27:34,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.53 | bwd_microstep: 5073.60 | bwd_inner_microstep: 5013.97 | bwd_allreduce_microstep: 59.56 | step_microstep: 181.35 [2024-07-31 21:27:34,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28304.13 | bwd: 40705.17 | bwd_inner: 39664.15 | bwd_allreduce: 1040.54 | step: 181.93 80%|███████▉ | 981/1230 [19:15:40<4:49:36, 69.79s/it] {'loss': 1.1245, 'learning_rate': 2.073819846871646e-06, 'epoch': 0.8} 80%|███████▉ | 981/1230 [19:15:40<4:49:36, 69.79s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4089 [2024-07-31 21:27:43,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.91 | bwd_microstep: 5233.45 | bwd_inner_microstep: 5214.39 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3854 [2024-07-31 21:27:52,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.09 | bwd_microstep: 5306.63 | bwd_inner_microstep: 5216.29 | bwd_allreduce_microstep: 90.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3794 [2024-07-31 21:28:01,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.50 | bwd_microstep: 5304.50 | bwd_inner_microstep: 5238.04 | bwd_allreduce_microstep: 66.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3118 [2024-07-31 21:28:09,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3062.42 | bwd_microstep: 4821.27 | bwd_inner_microstep: 4656.26 | bwd_allreduce_microstep: 164.95 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2817 [2024-07-31 21:28:17,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3073.78 | bwd_microstep: 5003.99 | bwd_inner_microstep: 4635.13 | bwd_allreduce_microstep: 368.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2132 [2024-07-31 21:28:26,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.53 | bwd_microstep: 5066.11 | bwd_inner_microstep: 4674.79 | bwd_allreduce_microstep: 391.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3678 [2024-07-31 21:28:34,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.24 | bwd_microstep: 4879.64 | bwd_inner_microstep: 4860.29 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3670 [2024-07-31 21:28:43,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 21:28:43,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3117.91 | bwd_microstep: 4992.72 | bwd_inner_microstep: 4930.98 | bwd_allreduce_microstep: 61.68 | step_microstep: 181.52 [2024-07-31 21:28:43,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27379.30 | bwd: 40608.30 | bwd_inner: 39426.11 | bwd_allreduce: 1181.70 | step: 182.10 80%|███████▉ | 982/1230 [19:16:49<4:46:37, 69.34s/it] {'loss': 1.1335, 'learning_rate': 2.0577912936782317e-06, 'epoch': 0.8} 80%|███████▉ | 982/1230 [19:16:49<4:46:37, 69.34s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3968 [2024-07-31 21:28:52,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.04 | bwd_microstep: 5443.40 | bwd_inner_microstep: 5381.94 | bwd_allreduce_microstep: 61.40 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3831 [2024-07-31 21:29:01,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.64 | bwd_microstep: 5185.39 | bwd_inner_microstep: 5139.26 | bwd_allreduce_microstep: 46.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-31 21:29:09,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.90 | bwd_microstep: 5134.40 | bwd_inner_microstep: 5084.80 | bwd_allreduce_microstep: 49.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 21:29:18,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.81 | bwd_microstep: 4996.72 | bwd_inner_microstep: 4977.40 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-07-31 21:29:27,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.44 | bwd_microstep: 5119.04 | bwd_inner_microstep: 4723.12 | bwd_allreduce_microstep: 395.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 21:29:36,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.95 | bwd_microstep: 5052.86 | bwd_inner_microstep: 5009.85 | bwd_allreduce_microstep: 42.95 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3644 [2024-07-31 21:29:44,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.24 | bwd_microstep: 5084.53 | bwd_inner_microstep: 5006.96 | bwd_allreduce_microstep: 77.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-31 21:29:53,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 21:29:53,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.74 | bwd_microstep: 5045.75 | bwd_inner_microstep: 4654.79 | bwd_allreduce_microstep: 390.90 | step_microstep: 182.89 [2024-07-31 21:29:53,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28986.67 | bwd: 41062.08 | bwd_inner: 39978.05 | bwd_allreduce: 1083.55 | step: 183.60 80%|███████▉ | 983/1230 [19:17:59<4:46:45, 69.66s/it] {'loss': 1.118, 'learning_rate': 2.041817816169187e-06, 'epoch': 0.8} 80%|███████▉ | 983/1230 [19:17:59<4:46:45, 69.66s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3995 [2024-07-31 21:30:02,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3854.45 | bwd_microstep: 5280.52 | bwd_inner_microstep: 5261.31 | bwd_allreduce_microstep: 19.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3762 [2024-07-31 21:30:11,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.29 | bwd_microstep: 5003.96 | bwd_inner_microstep: 4983.98 | bwd_allreduce_microstep: 19.91 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-31 21:30:20,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.79 | bwd_microstep: 4994.56 | bwd_inner_microstep: 4975.17 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 21:30:29,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.99 | bwd_microstep: 5036.36 | bwd_inner_microstep: 5010.47 | bwd_allreduce_microstep: 25.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 21:30:37,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.99 | bwd_microstep: 5013.04 | bwd_inner_microstep: 4993.65 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-31 21:30:46,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.89 | bwd_microstep: 5188.41 | bwd_inner_microstep: 4785.12 | bwd_allreduce_microstep: 403.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3787 [2024-07-31 21:30:55,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.95 | bwd_microstep: 5148.59 | bwd_inner_microstep: 5100.22 | bwd_allreduce_microstep: 48.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3682 [2024-07-31 21:31:04,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 21:31:04,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.93 | bwd_microstep: 4890.89 | bwd_inner_microstep: 4871.48 | bwd_allreduce_microstep: 19.34 | step_microstep: 188.21 [2024-07-31 21:31:04,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29705.18 | bwd: 40556.33 | bwd_inner: 39981.34 | bwd_allreduce: 574.50 | step: 188.79 80%|████████ | 984/1230 [19:19:10<4:46:46, 69.94s/it] {'loss': 1.1778, 'learning_rate': 2.025899525113474e-06, 'epoch': 0.8} 80%|████████ | 984/1230 [19:19:10<4:46:46, 69.94s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4078 [2024-07-31 21:31:13,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3897.51 | bwd_microstep: 5484.09 | bwd_inner_microstep: 5456.79 | bwd_allreduce_microstep: 27.24 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2822 [2024-07-31 21:31:22,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.45 | bwd_microstep: 5145.95 | bwd_inner_microstep: 4746.98 | bwd_allreduce_microstep: 398.90 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3593 [2024-07-31 21:31:31,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.04 | bwd_microstep: 5082.85 | bwd_inner_microstep: 5025.90 | bwd_allreduce_microstep: 56.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3813 [2024-07-31 21:31:39,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.23 | bwd_microstep: 5075.12 | bwd_inner_microstep: 5055.80 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-31 21:31:48,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.24 | bwd_microstep: 4975.91 | bwd_inner_microstep: 4956.46 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 21:31:57,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.90 | bwd_microstep: 5138.67 | bwd_inner_microstep: 5072.90 | bwd_allreduce_microstep: 65.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 21:32:06,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.78 | bwd_microstep: 5049.92 | bwd_inner_microstep: 4984.41 | bwd_allreduce_microstep: 65.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-31 21:32:15,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 21:32:15,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.32 | bwd_microstep: 5237.56 | bwd_inner_microstep: 4831.75 | bwd_allreduce_microstep: 405.75 | step_microstep: 181.87 [2024-07-31 21:32:15,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29289.38 | bwd: 41190.06 | bwd_inner: 40130.93 | bwd_allreduce: 1058.65 | step: 182.44 80%|████████ | 985/1230 [19:20:20<4:46:40, 70.21s/it] {'loss': 1.0896, 'learning_rate': 2.010036530897361e-06, 'epoch': 0.8} 80%|████████ | 985/1230 [19:20:20<4:46:40, 70.21s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2376 [2024-07-31 21:32:23,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.62 | bwd_microstep: 5280.39 | bwd_inner_microstep: 4873.20 | bwd_allreduce_microstep: 407.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2227 [2024-07-31 21:32:33,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.51 | bwd_microstep: 5496.33 | bwd_inner_microstep: 5071.58 | bwd_allreduce_microstep: 424.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 21:32:41,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3671.27 | bwd_microstep: 4964.62 | bwd_inner_microstep: 4935.24 | bwd_allreduce_microstep: 29.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-31 21:32:50,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.21 | bwd_microstep: 4995.26 | bwd_inner_microstep: 4975.88 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 21:32:59,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.40 | bwd_microstep: 4980.54 | bwd_inner_microstep: 4961.15 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 21:33:07,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.93 | bwd_microstep: 5157.44 | bwd_inner_microstep: 5083.63 | bwd_allreduce_microstep: 73.74 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-31 21:33:16,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.44 | bwd_microstep: 5174.61 | bwd_inner_microstep: 4769.70 | bwd_allreduce_microstep: 404.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-31 21:33:25,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 21:33:25,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.87 | bwd_microstep: 5101.08 | bwd_inner_microstep: 5033.90 | bwd_allreduce_microstep: 67.10 | step_microstep: 181.69 [2024-07-31 21:33:25,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29092.16 | bwd: 41150.25 | bwd_inner: 39704.23 | bwd_allreduce: 1445.53 | step: 182.27 80%|████████ | 986/1230 [19:21:31<4:45:57, 70.32s/it] {'loss': 1.1314, 'learning_rate': 1.994228943523654e-06, 'epoch': 0.8} 80%|████████ | 986/1230 [19:21:31<4:45:57, 70.32s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 21:33:34,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.55 | bwd_microstep: 5191.81 | bwd_inner_microstep: 5172.70 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2286 [2024-07-31 21:33:43,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.25 | bwd_microstep: 5198.95 | bwd_inner_microstep: 4798.20 | bwd_allreduce_microstep: 400.68 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3913 [2024-07-31 21:33:52,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.64 | bwd_microstep: 5145.69 | bwd_inner_microstep: 5121.21 | bwd_allreduce_microstep: 24.41 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-31 21:34:00,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.16 | bwd_microstep: 4999.57 | bwd_inner_microstep: 4980.20 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-31 21:34:09,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.92 | bwd_microstep: 4976.44 | bwd_inner_microstep: 4957.11 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3833 [2024-07-31 21:34:18,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.67 | bwd_microstep: 5064.88 | bwd_inner_microstep: 5045.54 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2232 [2024-07-31 21:34:27,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.09 | bwd_microstep: 5118.44 | bwd_inner_microstep: 4720.32 | bwd_allreduce_microstep: 398.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 21:34:36,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 21:34:36,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.54 | bwd_microstep: 5157.49 | bwd_inner_microstep: 5087.05 | bwd_allreduce_microstep: 70.36 | step_microstep: 181.82 [2024-07-31 21:34:36,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29290.72 | bwd: 40853.25 | bwd_inner: 39882.29 | bwd_allreduce: 970.46 | step: 182.41 80%|████████ | 987/1230 [19:22:41<4:44:58, 70.36s/it] {'loss': 1.08, 'learning_rate': 1.978476872610939e-06, 'epoch': 0.8} 80%|████████ | 987/1230 [19:22:41<4:44:58, 70.36s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3986 [2024-07-31 21:34:45,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3834.29 | bwd_microstep: 5241.69 | bwd_inner_microstep: 5222.67 | bwd_allreduce_microstep: 18.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2293 [2024-07-31 21:34:53,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.92 | bwd_microstep: 5202.01 | bwd_inner_microstep: 4796.59 | bwd_allreduce_microstep: 405.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-31 21:35:02,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.81 | bwd_microstep: 5092.10 | bwd_inner_microstep: 5067.47 | bwd_allreduce_microstep: 24.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 21:35:11,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.89 | bwd_microstep: 5180.76 | bwd_inner_microstep: 5100.25 | bwd_allreduce_microstep: 80.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 21:35:20,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.36 | bwd_microstep: 4971.04 | bwd_inner_microstep: 4951.70 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.07 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2184 [2024-07-31 21:35:29,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.18 | bwd_microstep: 5118.81 | bwd_inner_microstep: 4721.32 | bwd_allreduce_microstep: 397.42 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3653 [2024-07-31 21:35:37,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3692.90 | bwd_microstep: 4869.92 | bwd_inner_microstep: 4850.52 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3668 [2024-07-31 21:35:46,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 21:35:46,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.38 | bwd_microstep: 4855.17 | bwd_inner_microstep: 4829.94 | bwd_allreduce_microstep: 25.17 | step_microstep: 181.58 [2024-07-31 21:35:46,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29342.61 | bwd: 40531.49 | bwd_inner: 39540.40 | bwd_allreduce: 990.60 | step: 182.16 80%|████████ | 988/1230 [19:23:52<4:43:36, 70.32s/it] {'loss': 1.1247, 'learning_rate': 1.962780427392823e-06, 'epoch': 0.8} 80%|████████ | 988/1230 [19:23:52<4:43:36, 70.32s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3944 [2024-07-31 21:35:55,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3831.54 | bwd_microstep: 5156.67 | bwd_inner_microstep: 5137.56 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2259 [2024-07-31 21:36:04,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.76 | bwd_microstep: 5182.12 | bwd_inner_microstep: 4779.36 | bwd_allreduce_microstep: 402.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3843 [2024-07-31 21:36:12,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.42 | bwd_microstep: 5285.08 | bwd_inner_microstep: 5222.85 | bwd_allreduce_microstep: 62.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3765 [2024-07-31 21:36:21,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.81 | bwd_microstep: 5041.13 | bwd_inner_microstep: 5017.86 | bwd_allreduce_microstep: 23.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 21:36:30,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.03 | bwd_microstep: 5219.23 | bwd_inner_microstep: 5138.48 | bwd_allreduce_microstep: 80.68 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3694 [2024-07-31 21:36:39,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.82 | bwd_microstep: 5247.78 | bwd_inner_microstep: 5150.80 | bwd_allreduce_microstep: 96.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-31 21:36:47,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.82 | bwd_microstep: 4788.12 | bwd_inner_microstep: 4768.75 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-31 21:36:56,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 21:36:56,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.40 | bwd_microstep: 5099.82 | bwd_inner_microstep: 4704.00 | bwd_allreduce_microstep: 395.75 | step_microstep: 181.37 [2024-07-31 21:36:56,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28758.51 | bwd: 41019.92 | bwd_inner: 39919.61 | bwd_allreduce: 1099.84 | step: 181.95 80%|████████ | 989/1230 [19:25:02<4:42:11, 70.25s/it] {'loss': 1.1553, 'learning_rate': 1.9471397167171735e-06, 'epoch': 0.8} 80%|████████ | 989/1230 [19:25:02<4:42:11, 70.25s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2396 [2024-07-31 21:37:05,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.38 | bwd_microstep: 5303.97 | bwd_inner_microstep: 4893.13 | bwd_allreduce_microstep: 410.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2251 [2024-07-31 21:37:14,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.95 | bwd_microstep: 5251.38 | bwd_inner_microstep: 4843.49 | bwd_allreduce_microstep: 407.83 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3767 [2024-07-31 21:37:22,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.76 | bwd_microstep: 5176.77 | bwd_inner_microstep: 5136.13 | bwd_allreduce_microstep: 40.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-07-31 21:37:31,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.68 | bwd_microstep: 5124.85 | bwd_inner_microstep: 5076.99 | bwd_allreduce_microstep: 47.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-31 21:37:39,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3256.99 | bwd_microstep: 4995.16 | bwd_inner_microstep: 4931.80 | bwd_allreduce_microstep: 63.29 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3650 [2024-07-31 21:37:48,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.08 | bwd_microstep: 5190.33 | bwd_inner_microstep: 5091.05 | bwd_allreduce_microstep: 99.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 21:37:57,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.78 | bwd_microstep: 4891.67 | bwd_inner_microstep: 4872.30 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-31 21:38:06,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 21:38:06,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.64 | bwd_microstep: 5112.80 | bwd_inner_microstep: 4716.77 | bwd_allreduce_microstep: 395.96 | step_microstep: 183.11 [2024-07-31 21:38:06,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28465.18 | bwd: 41046.92 | bwd_inner: 39561.60 | bwd_allreduce: 1484.84 | step: 183.69 80%|████████ | 990/1230 [19:26:12<4:40:31, 70.13s/it] {'loss': 1.146, 'learning_rate': 1.931554849045353e-06, 'epoch': 0.8} 80%|████████ | 990/1230 [19:26:12<4:40:31, 70.13s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3915 [2024-07-31 21:38:14,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3209.71 | bwd_microstep: 5289.48 | bwd_inner_microstep: 5221.53 | bwd_allreduce_microstep: 67.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 21:38:23,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.71 | bwd_microstep: 5054.74 | bwd_inner_microstep: 5027.23 | bwd_allreduce_microstep: 27.44 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3783 [2024-07-31 21:38:32,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.26 | bwd_microstep: 5141.85 | bwd_inner_microstep: 5104.74 | bwd_allreduce_microstep: 37.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3797 [2024-07-31 21:38:41,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.73 | bwd_microstep: 5037.31 | bwd_inner_microstep: 5017.97 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-31 21:38:49,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.66 | bwd_microstep: 5116.95 | bwd_inner_microstep: 5070.07 | bwd_allreduce_microstep: 46.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-31 21:38:58,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.99 | bwd_microstep: 4877.04 | bwd_inner_microstep: 4857.68 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-31 21:39:07,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.43 | bwd_microstep: 4977.38 | bwd_inner_microstep: 4941.88 | bwd_allreduce_microstep: 35.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-31 21:39:16,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 21:39:16,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.00 | bwd_microstep: 5059.00 | bwd_inner_microstep: 4999.52 | bwd_allreduce_microstep: 59.42 | step_microstep: 181.75 [2024-07-31 21:39:16,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28982.40 | bwd: 40553.74 | bwd_inner: 40240.58 | bwd_allreduce: 312.69 | step: 182.32 81%|████████ | 991/1230 [19:27:21<4:39:02, 70.05s/it] {'loss': 1.1474, 'learning_rate': 1.916025932451496e-06, 'epoch': 0.81} 81%|████████ | 991/1230 [19:27:22<4:39:02, 70.05s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3992 [2024-07-31 21:39:25,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.08 | bwd_microstep: 5467.69 | bwd_inner_microstep: 5394.44 | bwd_allreduce_microstep: 73.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3845 [2024-07-31 21:39:34,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.09 | bwd_microstep: 5234.55 | bwd_inner_microstep: 5179.95 | bwd_allreduce_microstep: 54.54 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2133 [2024-07-31 21:39:43,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.88 | bwd_microstep: 5382.14 | bwd_inner_microstep: 4969.01 | bwd_allreduce_microstep: 413.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-31 21:39:51,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.17 | bwd_microstep: 5138.55 | bwd_inner_microstep: 5085.83 | bwd_allreduce_microstep: 52.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 21:40:00,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.14 | bwd_microstep: 4880.13 | bwd_inner_microstep: 4852.85 | bwd_allreduce_microstep: 27.21 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 21:40:08,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.64 | bwd_microstep: 4970.11 | bwd_inner_microstep: 4921.73 | bwd_allreduce_microstep: 48.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2106 [2024-07-31 21:40:17,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.42 | bwd_microstep: 5132.25 | bwd_inner_microstep: 4734.02 | bwd_allreduce_microstep: 398.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 21:40:26,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 21:40:26,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.51 | bwd_microstep: 5136.46 | bwd_inner_microstep: 4739.98 | bwd_allreduce_microstep: 396.41 | step_microstep: 181.56 [2024-07-31 21:40:26,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28711.84 | bwd: 41341.87 | bwd_inner: 39877.74 | bwd_allreduce: 1463.65 | step: 182.14 81%|████████ | 992/1230 [19:28:32<4:38:16, 70.15s/it] {'loss': 1.1209, 'learning_rate': 1.9005530746217238e-06, 'epoch': 0.81} 81%|████████ | 992/1230 [19:28:32<4:38:16, 70.15s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4084 [2024-07-31 21:40:35,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3841.11 | bwd_microstep: 5361.99 | bwd_inner_microstep: 5342.91 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-31 21:40:43,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.84 | bwd_microstep: 4908.71 | bwd_inner_microstep: 4850.33 | bwd_allreduce_microstep: 58.32 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2059 [2024-07-31 21:40:52,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.48 | bwd_microstep: 5235.12 | bwd_inner_microstep: 4829.17 | bwd_allreduce_microstep: 405.88 | step_microstep: 0.19 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2060 [2024-07-31 21:41:01,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.72 | bwd_microstep: 5227.04 | bwd_inner_microstep: 4821.95 | bwd_allreduce_microstep: 405.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 21:41:10,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.24 | bwd_microstep: 5147.02 | bwd_inner_microstep: 5069.26 | bwd_allreduce_microstep: 77.69 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3698 [2024-07-31 21:41:18,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.07 | bwd_microstep: 5116.88 | bwd_inner_microstep: 5033.17 | bwd_allreduce_microstep: 83.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 21:41:27,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.52 | bwd_microstep: 4895.45 | bwd_inner_microstep: 4874.41 | bwd_allreduce_microstep: 20.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-31 21:41:36,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 21:41:36,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.76 | bwd_microstep: 5234.50 | bwd_inner_microstep: 4829.08 | bwd_allreduce_microstep: 405.35 | step_microstep: 182.59 [2024-07-31 21:41:36,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28662.63 | bwd: 41126.68 | bwd_inner: 39650.22 | bwd_allreduce: 1475.97 | step: 183.28 81%|████████ | 993/1230 [19:29:42<4:37:03, 70.14s/it] {'loss': 1.0977, 'learning_rate': 1.8851363828534253e-06, 'epoch': 0.81} 81%|████████ | 993/1230 [19:29:42<4:37:03, 70.14s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3633 [2024-07-31 21:41:45,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.54 | bwd_microstep: 5244.37 | bwd_inner_microstep: 5148.65 | bwd_allreduce_microstep: 95.66 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2033 [2024-07-31 21:41:53,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3074.37 | bwd_microstep: 5126.18 | bwd_inner_microstep: 4732.66 | bwd_allreduce_microstep: 393.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3585 [2024-07-31 21:42:01,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3111.85 | bwd_microstep: 4976.16 | bwd_inner_microstep: 4911.23 | bwd_allreduce_microstep: 64.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-31 21:42:10,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.58 | bwd_microstep: 5175.40 | bwd_inner_microstep: 5092.98 | bwd_allreduce_microstep: 82.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-31 21:42:19,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.65 | bwd_microstep: 5089.87 | bwd_inner_microstep: 4695.99 | bwd_allreduce_microstep: 393.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-31 21:42:27,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.08 | bwd_microstep: 5056.25 | bwd_inner_microstep: 4666.46 | bwd_allreduce_microstep: 389.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 21:42:36,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.81 | bwd_microstep: 5040.27 | bwd_inner_microstep: 4976.51 | bwd_allreduce_microstep: 63.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 21:42:45,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 21:42:45,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.13 | bwd_microstep: 5006.22 | bwd_inner_microstep: 4957.54 | bwd_allreduce_microstep: 48.62 | step_microstep: 181.77 [2024-07-31 21:42:45,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27549.92 | bwd: 40714.72 | bwd_inner: 39181.94 | bwd_allreduce: 1532.28 | step: 182.37 81%|████████ | 994/1230 [19:30:51<4:34:03, 69.68s/it] {'loss': 1.1195, 'learning_rate': 1.869775964054501e-06, 'epoch': 0.81} 81%|████████ | 994/1230 [19:30:51<4:34:03, 69.68s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 21:42:54,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.06 | bwd_microstep: 5587.13 | bwd_inner_microstep: 5521.46 | bwd_allreduce_microstep: 65.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3844 [2024-07-31 21:43:03,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.49 | bwd_microstep: 5283.65 | bwd_inner_microstep: 5220.84 | bwd_allreduce_microstep: 62.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-31 21:43:11,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.73 | bwd_microstep: 5050.32 | bwd_inner_microstep: 4973.63 | bwd_allreduce_microstep: 76.62 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-31 21:43:20,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.76 | bwd_microstep: 5140.74 | bwd_inner_microstep: 5061.79 | bwd_allreduce_microstep: 78.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-31 21:43:29,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.11 | bwd_microstep: 5171.13 | bwd_inner_microstep: 4769.00 | bwd_allreduce_microstep: 402.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-31 21:43:38,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.02 | bwd_microstep: 5123.83 | bwd_inner_microstep: 5057.25 | bwd_allreduce_microstep: 66.50 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3708 [2024-07-31 21:43:46,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.93 | bwd_microstep: 4937.43 | bwd_inner_microstep: 4887.99 | bwd_allreduce_microstep: 49.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-31 21:43:55,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 21:43:55,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.64 | bwd_microstep: 5185.37 | bwd_inner_microstep: 4782.53 | bwd_allreduce_microstep: 402.77 | step_microstep: 181.43 [2024-07-31 21:43:55,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28468.65 | bwd: 41479.57 | bwd_inner: 40274.43 | bwd_allreduce: 1204.67 | step: 182.02 81%|████████ | 995/1230 [19:32:01<4:33:36, 69.86s/it] {'loss': 1.1426, 'learning_rate': 1.8544719247426224e-06, 'epoch': 0.81} 81%|████████ | 995/1230 [19:32:01<4:33:36, 69.86s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 21:44:04,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3897.40 | bwd_microstep: 5441.79 | bwd_inner_microstep: 5414.96 | bwd_allreduce_microstep: 26.76 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2040 [2024-07-31 21:44:13,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.27 | bwd_microstep: 5322.90 | bwd_inner_microstep: 4909.99 | bwd_allreduce_microstep: 412.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3707 [2024-07-31 21:44:22,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.61 | bwd_microstep: 5315.88 | bwd_inner_microstep: 5211.68 | bwd_allreduce_microstep: 104.14 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3613 [2024-07-31 21:44:31,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.25 | bwd_microstep: 5221.83 | bwd_inner_microstep: 5113.96 | bwd_allreduce_microstep: 107.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-31 21:44:40,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.70 | bwd_microstep: 5275.77 | bwd_inner_microstep: 4867.75 | bwd_allreduce_microstep: 407.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3688 [2024-07-31 21:44:49,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.33 | bwd_microstep: 5060.48 | bwd_inner_microstep: 4989.61 | bwd_allreduce_microstep: 70.80 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2152 [2024-07-31 21:44:57,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3011.87 | bwd_microstep: 4897.53 | bwd_inner_microstep: 4522.60 | bwd_allreduce_microstep: 374.87 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1040 [2024-07-31 21:45:06,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 21:45:06,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.54 | bwd_microstep: 5279.75 | bwd_inner_microstep: 4872.12 | bwd_allreduce_microstep: 407.57 | step_microstep: 181.98 [2024-07-31 21:45:06,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28430.87 | bwd: 41815.91 | bwd_inner: 39902.60 | bwd_allreduce: 1912.82 | step: 182.56 81%|████████ | 996/1230 [19:33:11<4:33:16, 70.07s/it] {'loss': 1.1331, 'learning_rate': 1.8392243710444911e-06, 'epoch': 0.81} 81%|████████ | 996/1230 [19:33:11<4:33:16, 70.07s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3932 [2024-07-31 21:45:15,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.79 | bwd_microstep: 5270.49 | bwd_inner_microstep: 5216.98 | bwd_allreduce_microstep: 53.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 21:45:23,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3046.03 | bwd_microstep: 5017.23 | bwd_inner_microstep: 4630.89 | bwd_allreduce_microstep: 386.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3757 [2024-07-31 21:45:31,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.17 | bwd_microstep: 5058.67 | bwd_inner_microstep: 5031.14 | bwd_allreduce_microstep: 27.47 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3728 [2024-07-31 21:45:40,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.61 | bwd_microstep: 5047.80 | bwd_inner_microstep: 5014.26 | bwd_allreduce_microstep: 33.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-31 21:45:49,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.58 | bwd_microstep: 5138.04 | bwd_inner_microstep: 5067.40 | bwd_allreduce_microstep: 70.57 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-31 21:45:57,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3017.51 | bwd_microstep: 4887.63 | bwd_inner_microstep: 4511.36 | bwd_allreduce_microstep: 376.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-31 21:46:06,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.67 | bwd_microstep: 5035.98 | bwd_inner_microstep: 4997.00 | bwd_allreduce_microstep: 38.92 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2150 [2024-07-31 21:46:14,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 21:46:14,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.41 | bwd_microstep: 5095.28 | bwd_inner_microstep: 4698.91 | bwd_allreduce_microstep: 396.29 | step_microstep: 181.62 [2024-07-31 21:46:14,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27945.69 | bwd: 40551.10 | bwd_inner: 39167.87 | bwd_allreduce: 1382.75 | step: 182.21 81%|████████ | 997/1230 [19:34:20<4:30:39, 69.70s/it] {'loss': 1.1091, 'learning_rate': 1.8240334086951117e-06, 'epoch': 0.81} 81%|████████ | 997/1230 [19:34:20<4:30:39, 69.70s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3717 [2024-07-31 21:46:24,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.07 | bwd_microstep: 5582.09 | bwd_inner_microstep: 5491.64 | bwd_allreduce_microstep: 90.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3571 [2024-07-31 21:46:32,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.58 | bwd_microstep: 5123.70 | bwd_inner_microstep: 5044.89 | bwd_allreduce_microstep: 78.73 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3886 [2024-07-31 21:46:41,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.25 | bwd_microstep: 5215.83 | bwd_inner_microstep: 5169.13 | bwd_allreduce_microstep: 46.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-31 21:46:50,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.57 | bwd_microstep: 5022.79 | bwd_inner_microstep: 4997.84 | bwd_allreduce_microstep: 24.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-31 21:46:58,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3278.77 | bwd_microstep: 4852.46 | bwd_inner_microstep: 4826.08 | bwd_allreduce_microstep: 26.31 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 21:47:07,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.10 | bwd_microstep: 4990.08 | bwd_inner_microstep: 4970.75 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 21:47:16,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.02 | bwd_microstep: 5105.32 | bwd_inner_microstep: 5036.59 | bwd_allreduce_microstep: 68.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 21:47:25,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 21:47:25,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.58 | bwd_microstep: 4978.89 | bwd_inner_microstep: 4959.53 | bwd_allreduce_microstep: 19.29 | step_microstep: 181.94 [2024-07-31 21:47:25,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29012.84 | bwd: 40871.14 | bwd_inner: 40496.39 | bwd_allreduce: 374.26 | step: 182.52 81%|████████ | 998/1230 [19:35:30<4:30:06, 69.86s/it] {'loss': 1.1769, 'learning_rate': 1.8088991430370506e-06, 'epoch': 0.81} 81%|████████ | 998/1230 [19:35:30<4:30:06, 69.86s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 21:47:33,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.71 | bwd_microstep: 5184.98 | bwd_inner_microstep: 5111.40 | bwd_allreduce_microstep: 73.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3865 [2024-07-31 21:47:42,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3823.59 | bwd_microstep: 5133.52 | bwd_inner_microstep: 5109.10 | bwd_allreduce_microstep: 24.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-31 21:47:51,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.99 | bwd_microstep: 5053.64 | bwd_inner_microstep: 5025.73 | bwd_allreduce_microstep: 27.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 21:48:00,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.54 | bwd_microstep: 4983.86 | bwd_inner_microstep: 4964.52 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-31 21:48:08,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.60 | bwd_microstep: 5042.64 | bwd_inner_microstep: 4650.64 | bwd_allreduce_microstep: 391.93 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 21:48:17,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.78 | bwd_microstep: 5052.16 | bwd_inner_microstep: 4994.26 | bwd_allreduce_microstep: 57.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 21:48:26,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.01 | bwd_microstep: 5127.32 | bwd_inner_microstep: 4731.73 | bwd_allreduce_microstep: 395.52 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3659 [2024-07-31 21:48:35,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 21:48:35,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.96 | bwd_microstep: 5009.51 | bwd_inner_microstep: 4949.87 | bwd_allreduce_microstep: 59.57 | step_microstep: 182.84 [2024-07-31 21:48:35,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29041.08 | bwd: 40587.61 | bwd_inner: 39537.18 | bwd_allreduce: 1049.95 | step: 183.43 81%|████████ | 999/1230 [19:36:40<4:29:03, 69.89s/it] {'loss': 1.153, 'learning_rate': 1.7938216790197095e-06, 'epoch': 0.81} 81%|████████ | 999/1230 [19:36:40<4:29:03, 69.89s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3963 [2024-07-31 21:48:44,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3803.59 | bwd_microstep: 5234.32 | bwd_inner_microstep: 5215.18 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2320 [2024-07-31 21:48:52,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.95 | bwd_microstep: 5228.31 | bwd_inner_microstep: 4822.53 | bwd_allreduce_microstep: 405.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3779 [2024-07-31 21:49:01,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.94 | bwd_microstep: 5120.01 | bwd_inner_microstep: 5074.87 | bwd_allreduce_microstep: 45.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 21:49:10,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.21 | bwd_microstep: 5111.75 | bwd_inner_microstep: 5064.46 | bwd_allreduce_microstep: 47.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-31 21:49:19,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.92 | bwd_microstep: 5127.81 | bwd_inner_microstep: 5083.81 | bwd_allreduce_microstep: 43.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3785 [2024-07-31 21:49:27,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.79 | bwd_microstep: 5154.29 | bwd_inner_microstep: 5109.12 | bwd_allreduce_microstep: 45.10 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 21:49:36,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.01 | bwd_microstep: 4904.66 | bwd_inner_microstep: 4885.24 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3669 [2024-07-31 21:49:45,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 21:49:45,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.05 | bwd_microstep: 5026.12 | bwd_inner_microstep: 4953.90 | bwd_allreduce_microstep: 72.15 | step_microstep: 181.30 [2024-07-31 21:49:45,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29002.39 | bwd: 40907.27 | bwd_inner: 40209.05 | bwd_allreduce: 697.72 | step: 181.88 81%|████████▏ | 1000/1230 [19:37:51<4:28:18, 69.99s/it] {'loss': 1.1231, 'learning_rate': 1.77880112119859e-06, 'epoch': 0.81} 81%|████████▏ | 1000/1230 [19:37:51<4:28:18, 69.99s/it][INFO|trainer.py:2936] 2024-07-31 21:50:11,461 >> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000 [INFO|configuration_utils.py:473] 2024-07-31 21:50:11,463 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/config.json [INFO|configuration_utils.py:594] 2024-07-31 21:50:11,463 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/generation_config.json [INFO|modeling_utils.py:2501] 2024-07-31 21:51:04,094 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-07-31 21:51:04,096 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-07-31 21:51:04,096 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-07-31 21:51:04,096 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/added_tokens.json [2024-07-31 21:51:06,185] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is about to be saved! [2024-07-31 21:51:06,571] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_model_states.pt [2024-07-31 21:51:06,572] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_model_states.pt... [2024-07-31 21:51:08,283] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_model_states.pt. [2024-07-31 21:51:08,794] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-07-31 21:52:05,801] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-07-31 21:52:05,801] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1000/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-07-31 21:52:09,368] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1000 is ready now! [INFO|trainer.py:3028] 2024-07-31 21:52:09,400 >> Deleting older checkpoint [/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/checkpoint-800] due to args.save_total_limit dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3546 [2024-07-31 21:52:48,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.88 | bwd_microstep: 5244.88 | bwd_inner_microstep: 5092.25 | bwd_allreduce_microstep: 152.56 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3791 [2024-07-31 21:52:57,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.01 | bwd_microstep: 5076.97 | bwd_inner_microstep: 5034.13 | bwd_allreduce_microstep: 42.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3815 [2024-07-31 21:53:06,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.13 | bwd_microstep: 5162.91 | bwd_inner_microstep: 5116.32 | bwd_allreduce_microstep: 46.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-31 21:53:14,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.92 | bwd_microstep: 4979.09 | bwd_inner_microstep: 4959.76 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-31 21:53:23,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.32 | bwd_microstep: 5176.13 | bwd_inner_microstep: 5114.23 | bwd_allreduce_microstep: 61.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-31 21:53:31,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2981.07 | bwd_microstep: 4839.43 | bwd_inner_microstep: 4465.12 | bwd_allreduce_microstep: 374.24 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 21:53:39,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.29 | bwd_microstep: 4994.99 | bwd_inner_microstep: 4944.06 | bwd_allreduce_microstep: 50.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-31 21:53:48,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 21:53:48,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.04 | bwd_microstep: 5017.87 | bwd_inner_microstep: 4964.25 | bwd_allreduce_microstep: 53.55 | step_microstep: 181.53 [2024-07-31 21:53:48,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28101.56 | bwd: 40492.25 | bwd_inner: 39690.05 | bwd_allreduce: 801.72 | step: 182.14 81%|████████▏ | 1001/1230 [19:41:54<7:45:41, 122.01s/it] {'loss': 1.1393, 'learning_rate': 1.7638375737345804e-06, 'epoch': 0.81} 81%|████████▏ | 1001/1230 [19:41:54<7:45:41, 122.01s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2428 [2024-07-31 21:53:57,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.36 | bwd_microstep: 5293.23 | bwd_inner_microstep: 4886.03 | bwd_allreduce_microstep: 407.13 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3796 [2024-07-31 21:54:06,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.48 | bwd_microstep: 5165.05 | bwd_inner_microstep: 5096.47 | bwd_allreduce_microstep: 68.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 21:54:14,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.07 | bwd_microstep: 4856.70 | bwd_inner_microstep: 4806.41 | bwd_allreduce_microstep: 50.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-31 21:54:23,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.82 | bwd_microstep: 5035.95 | bwd_inner_microstep: 5012.19 | bwd_allreduce_microstep: 23.70 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3769 [2024-07-31 21:54:31,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.40 | bwd_microstep: 5015.89 | bwd_inner_microstep: 4977.95 | bwd_allreduce_microstep: 37.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-31 21:54:40,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.98 | bwd_microstep: 5247.99 | bwd_inner_microstep: 4840.91 | bwd_allreduce_microstep: 407.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 21:54:49,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.03 | bwd_microstep: 5070.01 | bwd_inner_microstep: 5006.40 | bwd_allreduce_microstep: 63.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3804 [2024-07-31 21:54:58,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 21:54:58,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.19 | bwd_microstep: 5037.90 | bwd_inner_microstep: 5018.58 | bwd_allreduce_microstep: 19.24 | step_microstep: 181.90 [2024-07-31 21:54:58,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28639.23 | bwd: 40722.68 | bwd_inner: 39644.89 | bwd_allreduce: 1077.32 | step: 182.49 81%|████████▏ | 1002/1230 [19:43:04<6:44:00, 106.32s/it] {'loss': 1.1331, 'learning_rate': 1.7489311403932274e-06, 'epoch': 0.81} 81%|████████▏ | 1002/1230 [19:43:04<6:44:00, 106.32s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3909 [2024-07-31 21:55:07,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.23 | bwd_microstep: 5133.68 | bwd_inner_microstep: 5099.55 | bwd_allreduce_microstep: 34.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2246 [2024-07-31 21:55:15,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.51 | bwd_microstep: 5178.85 | bwd_inner_microstep: 4778.03 | bwd_allreduce_microstep: 400.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3894 [2024-07-31 21:55:24,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.56 | bwd_microstep: 5052.18 | bwd_inner_microstep: 5021.68 | bwd_allreduce_microstep: 30.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-31 21:55:33,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.28 | bwd_microstep: 5123.22 | bwd_inner_microstep: 5051.29 | bwd_allreduce_microstep: 71.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2124 [2024-07-31 21:55:41,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3009.57 | bwd_microstep: 4855.80 | bwd_inner_microstep: 4480.67 | bwd_allreduce_microstep: 375.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-31 21:55:49,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.53 | bwd_microstep: 5110.29 | bwd_inner_microstep: 5038.75 | bwd_allreduce_microstep: 71.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3632 [2024-07-31 21:55:58,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.83 | bwd_microstep: 4887.41 | bwd_inner_microstep: 4860.03 | bwd_allreduce_microstep: 27.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 21:56:07,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 21:56:07,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.15 | bwd_microstep: 5051.59 | bwd_inner_microstep: 4984.38 | bwd_allreduce_microstep: 67.15 | step_microstep: 182.09 [2024-07-31 21:56:07,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28297.57 | bwd: 40392.99 | bwd_inner: 39314.31 | bwd_allreduce: 1078.19 | step: 182.67 82%|████████▏ | 1003/1230 [19:44:13<5:59:54, 95.13s/it] {'loss': 1.1521, 'learning_rate': 1.7340819245440144e-06, 'epoch': 0.82} 82%|████████▏ | 1003/1230 [19:44:13<5:59:54, 95.13s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2435 [2024-07-31 21:56:16,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.10 | bwd_microstep: 5395.02 | bwd_inner_microstep: 4980.70 | bwd_allreduce_microstep: 414.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-31 21:56:25,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.93 | bwd_microstep: 5194.27 | bwd_inner_microstep: 5138.09 | bwd_allreduce_microstep: 56.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3940 [2024-07-31 21:56:34,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.08 | bwd_microstep: 5130.63 | bwd_inner_microstep: 5095.31 | bwd_allreduce_microstep: 35.25 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3619 [2024-07-31 21:56:42,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.18 | bwd_microstep: 5165.58 | bwd_inner_microstep: 5104.72 | bwd_allreduce_microstep: 60.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-07-31 21:56:51,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.50 | bwd_microstep: 5163.18 | bwd_inner_microstep: 5111.50 | bwd_allreduce_microstep: 51.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 21:57:00,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.36 | bwd_microstep: 4881.21 | bwd_inner_microstep: 4861.86 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2214 [2024-07-31 21:57:08,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.63 | bwd_microstep: 5097.74 | bwd_inner_microstep: 4701.13 | bwd_allreduce_microstep: 396.55 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3658 [2024-07-31 21:57:17,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 21:57:17,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.84 | bwd_microstep: 4964.99 | bwd_inner_microstep: 4928.90 | bwd_allreduce_microstep: 36.01 | step_microstep: 181.63 [2024-07-31 21:57:17,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29092.53 | bwd: 40992.61 | bwd_inner: 39922.16 | bwd_allreduce: 1069.96 | step: 182.21 82%|████████▏ | 1004/1230 [19:45:23<5:30:23, 87.72s/it] {'loss': 1.1218, 'learning_rate': 1.7192900291596493e-06, 'epoch': 0.82} 82%|████████▏ | 1004/1230 [19:45:23<5:30:23, 87.72s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4006 [2024-07-31 21:57:26,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3844.47 | bwd_microstep: 5275.03 | bwd_inner_microstep: 5255.94 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3769 [2024-07-31 21:57:35,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.90 | bwd_microstep: 5003.25 | bwd_inner_microstep: 4983.76 | bwd_allreduce_microstep: 19.42 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-07-31 21:57:43,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3063.90 | bwd_microstep: 5058.22 | bwd_inner_microstep: 4670.41 | bwd_allreduce_microstep: 387.74 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2202 [2024-07-31 21:57:52,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.08 | bwd_microstep: 5135.85 | bwd_inner_microstep: 4738.21 | bwd_allreduce_microstep: 397.57 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 21:58:01,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.54 | bwd_microstep: 5216.22 | bwd_inner_microstep: 4810.51 | bwd_allreduce_microstep: 405.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-31 21:58:10,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.21 | bwd_microstep: 5209.12 | bwd_inner_microstep: 4802.76 | bwd_allreduce_microstep: 406.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-31 21:58:18,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.43 | bwd_microstep: 4981.66 | bwd_inner_microstep: 4962.23 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 21:58:27,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 21:58:27,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.85 | bwd_microstep: 5074.90 | bwd_inner_microstep: 5045.41 | bwd_allreduce_microstep: 29.43 | step_microstep: 181.82 [2024-07-31 21:58:27,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28706.29 | bwd: 40954.23 | bwd_inner: 39269.18 | bwd_allreduce: 1684.54 | step: 182.53 82%|████████▏ | 1005/1230 [19:46:33<5:08:59, 82.40s/it] {'loss': 1.1196, 'learning_rate': 1.7045555568153438e-06, 'epoch': 0.82} 82%|████████▏ | 1005/1230 [19:46:33<5:08:59, 82.40s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3913 [2024-07-31 21:58:36,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.27 | bwd_microstep: 5408.58 | bwd_inner_microstep: 5349.50 | bwd_allreduce_microstep: 59.00 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3827 [2024-07-31 21:58:45,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.05 | bwd_microstep: 5072.74 | bwd_inner_microstep: 5048.16 | bwd_allreduce_microstep: 24.51 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3750 [2024-07-31 21:58:54,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.14 | bwd_microstep: 5153.68 | bwd_inner_microstep: 5097.43 | bwd_allreduce_microstep: 56.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 21:59:03,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.65 | bwd_microstep: 5183.44 | bwd_inner_microstep: 5097.71 | bwd_allreduce_microstep: 85.66 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 21:59:12,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.49 | bwd_microstep: 5159.68 | bwd_inner_microstep: 5077.70 | bwd_allreduce_microstep: 81.90 | step_microstep: 0.08 dynamic ViT batch size: 5, images per sample: 2.5, dynamic token length: 1410 [2024-07-31 21:59:21,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.91 | bwd_microstep: 5264.05 | bwd_inner_microstep: 4859.09 | bwd_allreduce_microstep: 404.90 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3654 [2024-07-31 21:59:29,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.13 | bwd_microstep: 5002.36 | bwd_inner_microstep: 4930.29 | bwd_allreduce_microstep: 72.00 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2169 [2024-07-31 21:59:38,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 21:59:38,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.46 | bwd_microstep: 5093.74 | bwd_inner_microstep: 4699.33 | bwd_allreduce_microstep: 394.35 | step_microstep: 181.73 [2024-07-31 21:59:38,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28898.99 | bwd: 41338.26 | bwd_inner: 40159.14 | bwd_allreduce: 1178.62 | step: 182.45 82%|████████▏ | 1006/1230 [19:47:44<4:54:22, 78.85s/it] {'loss': 1.1207, 'learning_rate': 1.6898786096881104e-06, 'epoch': 0.82} 82%|████████▏ | 1006/1230 [19:47:44<4:54:22, 78.85s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 21:59:47,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3858.53 | bwd_microstep: 5212.85 | bwd_inner_microstep: 5183.42 | bwd_allreduce_microstep: 29.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3657 [2024-07-31 21:59:56,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.09 | bwd_microstep: 4990.07 | bwd_inner_microstep: 4950.89 | bwd_allreduce_microstep: 39.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-31 22:00:04,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3263.78 | bwd_microstep: 5061.74 | bwd_inner_microstep: 4986.14 | bwd_allreduce_microstep: 75.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3842 [2024-07-31 22:00:13,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.33 | bwd_microstep: 5111.39 | bwd_inner_microstep: 5091.95 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3776 [2024-07-31 22:00:22,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.16 | bwd_microstep: 5018.21 | bwd_inner_microstep: 4998.74 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 22:00:30,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.80 | bwd_microstep: 5059.39 | bwd_inner_microstep: 4992.70 | bwd_allreduce_microstep: 66.62 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-31 22:00:39,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.93 | bwd_microstep: 5159.40 | bwd_inner_microstep: 4758.38 | bwd_allreduce_microstep: 400.95 | step_microstep: 0.19 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3680 [2024-07-31 22:00:48,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 22:00:48,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.45 | bwd_microstep: 5169.84 | bwd_inner_microstep: 5088.21 | bwd_allreduce_microstep: 81.56 | step_microstep: 182.54 [2024-07-31 22:00:48,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29028.96 | bwd: 40782.87 | bwd_inner: 40050.38 | bwd_allreduce: 732.00 | step: 183.24 82%|████████▏ | 1007/1230 [19:48:54<4:43:21, 76.24s/it] {'loss': 1.1627, 'learning_rate': 1.6752592895560493e-06, 'epoch': 0.82} 82%|████████▏ | 1007/1230 [19:48:54<4:43:21, 76.24s/it]dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1332 [2024-07-31 22:00:57,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.74 | bwd_microstep: 5569.23 | bwd_inner_microstep: 5140.62 | bwd_allreduce_microstep: 428.54 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3843 [2024-07-31 22:01:05,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3247.27 | bwd_microstep: 4901.50 | bwd_inner_microstep: 4882.20 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2267 [2024-07-31 22:01:14,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.73 | bwd_microstep: 5218.04 | bwd_inner_microstep: 4812.21 | bwd_allreduce_microstep: 405.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-31 22:01:23,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.73 | bwd_microstep: 5061.37 | bwd_inner_microstep: 5034.63 | bwd_allreduce_microstep: 26.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 22:01:32,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.70 | bwd_microstep: 5108.43 | bwd_inner_microstep: 5038.10 | bwd_allreduce_microstep: 70.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-31 22:01:40,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3030.97 | bwd_microstep: 5023.42 | bwd_inner_microstep: 4636.04 | bwd_allreduce_microstep: 387.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 22:01:49,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.43 | bwd_microstep: 5205.23 | bwd_inner_microstep: 5131.23 | bwd_allreduce_microstep: 73.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 22:01:57,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.80 [2024-07-31 22:01:57,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.13 | bwd_microstep: 4887.41 | bwd_inner_microstep: 4867.97 | bwd_allreduce_microstep: 19.36 | step_microstep: 181.99 [2024-07-31 22:01:57,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28117.59 | bwd: 40974.60 | bwd_inner: 39542.94 | bwd_allreduce: 1431.17 | step: 182.58 82%|████████▏ | 1008/1230 [19:50:03<4:34:31, 74.19s/it] {'loss': 1.1614, 'learning_rate': 1.6606976977976408e-06, 'epoch': 0.82} 82%|████████▏ | 1008/1230 [19:50:03<4:34:31, 74.19s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 22:02:07,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3671.31 | bwd_microstep: 5451.23 | bwd_inner_microstep: 5341.60 | bwd_allreduce_microstep: 109.56 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3873 [2024-07-31 22:02:16,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3792.22 | bwd_microstep: 5114.20 | bwd_inner_microstep: 5094.89 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-31 22:02:23,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2980.04 | bwd_microstep: 4797.03 | bwd_inner_microstep: 4425.87 | bwd_allreduce_microstep: 371.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 22:02:32,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.13 | bwd_microstep: 4971.57 | bwd_inner_microstep: 4943.45 | bwd_allreduce_microstep: 28.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3794 [2024-07-31 22:02:41,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.63 | bwd_microstep: 4948.50 | bwd_inner_microstep: 4921.77 | bwd_allreduce_microstep: 26.67 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2860 [2024-07-31 22:02:50,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.27 | bwd_microstep: 5211.04 | bwd_inner_microstep: 4803.54 | bwd_allreduce_microstep: 407.43 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3696 [2024-07-31 22:02:58,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3107.19 | bwd_microstep: 4879.63 | bwd_inner_microstep: 4838.92 | bwd_allreduce_microstep: 40.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 22:03:06,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 22:03:06,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.10 | bwd_microstep: 4972.08 | bwd_inner_microstep: 4918.66 | bwd_allreduce_microstep: 53.33 | step_microstep: 203.36 [2024-07-31 22:03:06,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28008.80 | bwd: 40345.26 | bwd_inner: 39288.63 | bwd_allreduce: 1056.13 | step: 203.95 82%|████████▏ | 1009/1230 [19:51:12<4:27:21, 72.59s/it] {'loss': 1.1378, 'learning_rate': 1.6461939353910473e-06, 'epoch': 0.82} 82%|████████▏ | 1009/1230 [19:51:12<4:27:21, 72.59s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3911 [2024-07-31 22:03:15,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3819.88 | bwd_microstep: 5154.35 | bwd_inner_microstep: 5135.23 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 22:03:24,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.82 | bwd_microstep: 5268.32 | bwd_inner_microstep: 4861.16 | bwd_allreduce_microstep: 407.10 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-31 22:03:33,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.53 | bwd_microstep: 5120.05 | bwd_inner_microstep: 5085.46 | bwd_allreduce_microstep: 34.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2142 [2024-07-31 22:03:41,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3040.49 | bwd_microstep: 5004.53 | bwd_inner_microstep: 4617.51 | bwd_allreduce_microstep: 386.96 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-31 22:03:50,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.29 | bwd_microstep: 5169.94 | bwd_inner_microstep: 5115.55 | bwd_allreduce_microstep: 54.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 22:03:58,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.38 | bwd_microstep: 5018.27 | bwd_inner_microstep: 4964.61 | bwd_allreduce_microstep: 53.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3702 [2024-07-31 22:04:07,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.58 | bwd_microstep: 5049.53 | bwd_inner_microstep: 4979.78 | bwd_allreduce_microstep: 69.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 22:04:16,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 22:04:16,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3393.60 | bwd_microstep: 4852.44 | bwd_inner_microstep: 4817.73 | bwd_allreduce_microstep: 34.64 | step_microstep: 181.82 [2024-07-31 22:04:16,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28334.49 | bwd: 40637.42 | bwd_inner: 39576.97 | bwd_allreduce: 1059.97 | step: 182.51 82%|████████▏ | 1010/1230 [19:52:21<4:22:32, 71.60s/it] {'loss': 1.1669, 'learning_rate': 1.631748102913412e-06, 'epoch': 0.82} 82%|████████▏ | 1010/1230 [19:52:21<4:22:32, 71.60s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2271 [2024-07-31 22:04:25,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.27 | bwd_microstep: 5375.76 | bwd_inner_microstep: 4962.53 | bwd_allreduce_microstep: 413.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-31 22:04:33,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.03 | bwd_microstep: 5224.37 | bwd_inner_microstep: 5162.11 | bwd_allreduce_microstep: 62.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3606 [2024-07-31 22:04:42,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.88 | bwd_microstep: 5116.86 | bwd_inner_microstep: 5046.08 | bwd_allreduce_microstep: 70.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2215 [2024-07-31 22:04:50,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3034.77 | bwd_microstep: 4971.04 | bwd_inner_microstep: 4587.88 | bwd_allreduce_microstep: 383.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 22:04:59,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.41 | bwd_microstep: 5098.54 | bwd_inner_microstep: 5033.52 | bwd_allreduce_microstep: 64.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-31 22:05:08,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.71 | bwd_microstep: 5231.21 | bwd_inner_microstep: 4825.27 | bwd_allreduce_microstep: 405.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3667 [2024-07-31 22:05:16,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.12 | bwd_microstep: 4866.14 | bwd_inner_microstep: 4846.56 | bwd_allreduce_microstep: 19.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 22:05:25,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 22:05:25,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.30 | bwd_microstep: 5211.32 | bwd_inner_microstep: 5134.84 | bwd_allreduce_microstep: 76.41 | step_microstep: 181.14 [2024-07-31 22:05:25,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28250.42 | bwd: 41095.22 | bwd_inner: 39598.74 | bwd_allreduce: 1495.99 | step: 181.72 82%|████████▏ | 1011/1230 [19:53:31<4:19:14, 71.02s/it] {'loss': 1.1814, 'learning_rate': 1.6173603005401505e-06, 'epoch': 0.82} 82%|████████▏ | 1011/1230 [19:53:31<4:19:14, 71.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3909 [2024-07-31 22:05:34,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.08 | bwd_microstep: 5392.93 | bwd_inner_microstep: 5324.87 | bwd_allreduce_microstep: 67.99 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3869 [2024-07-31 22:05:43,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.26 | bwd_microstep: 5309.87 | bwd_inner_microstep: 5244.47 | bwd_allreduce_microstep: 65.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 22:05:52,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.82 | bwd_microstep: 5261.28 | bwd_inner_microstep: 4853.53 | bwd_allreduce_microstep: 407.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-31 22:06:01,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.02 | bwd_microstep: 5412.65 | bwd_inner_microstep: 4993.86 | bwd_allreduce_microstep: 418.73 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-31 22:06:10,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.98 | bwd_microstep: 4970.92 | bwd_inner_microstep: 4935.35 | bwd_allreduce_microstep: 35.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-31 22:06:19,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.42 | bwd_microstep: 5013.51 | bwd_inner_microstep: 4974.93 | bwd_allreduce_microstep: 38.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-31 22:06:27,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.37 | bwd_microstep: 5109.54 | bwd_inner_microstep: 5039.46 | bwd_allreduce_microstep: 69.97 | step_microstep: 0.29 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-31 22:06:36,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.78 [2024-07-31 22:06:36,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.33 | bwd_microstep: 5189.01 | bwd_inner_microstep: 5110.83 | bwd_allreduce_microstep: 78.12 | step_microstep: 182.36 [2024-07-31 22:06:36,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29061.19 | bwd: 41659.70 | bwd_inner: 40477.24 | bwd_allreduce: 1181.94 | step: 183.17 82%|████████▏ | 1012/1230 [19:54:42<4:18:05, 71.04s/it] {'loss': 1.1652, 'learning_rate': 1.6030306280442764e-06, 'epoch': 0.82} 82%|████████▏ | 1012/1230 [19:54:42<4:18:05, 71.04s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 22:06:46,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 4042.79 | bwd_microstep: 5421.56 | bwd_inner_microstep: 5402.39 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3556 [2024-07-31 22:06:55,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.17 | bwd_microstep: 5202.49 | bwd_inner_microstep: 5107.76 | bwd_allreduce_microstep: 94.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-07-31 22:07:03,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.06 | bwd_microstep: 5015.94 | bwd_inner_microstep: 4996.58 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-31 22:07:12,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.61 | bwd_microstep: 5253.52 | bwd_inner_microstep: 4845.53 | bwd_allreduce_microstep: 407.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-31 22:07:21,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.92 | bwd_microstep: 5015.34 | bwd_inner_microstep: 4979.95 | bwd_allreduce_microstep: 35.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-31 22:07:29,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3194.75 | bwd_microstep: 4683.74 | bwd_inner_microstep: 4662.84 | bwd_allreduce_microstep: 20.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 22:07:38,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.21 | bwd_microstep: 4990.75 | bwd_inner_microstep: 4957.45 | bwd_allreduce_microstep: 33.22 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2125 [2024-07-31 22:07:46,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 22:07:46,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.90 | bwd_microstep: 5068.25 | bwd_inner_microstep: 4675.28 | bwd_allreduce_microstep: 392.90 | step_microstep: 181.74 [2024-07-31 22:07:46,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29056.30 | bwd: 40651.58 | bwd_inner: 39627.74 | bwd_allreduce: 1023.34 | step: 182.33 82%|████████▏ | 1013/1230 [19:55:52<4:15:50, 70.74s/it] {'loss': 1.1286, 'learning_rate': 1.588759184795694e-06, 'epoch': 0.82} 82%|████████▏ | 1013/1230 [19:55:52<4:15:50, 70.74s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3953 [2024-07-31 22:07:55,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.09 | bwd_microstep: 5286.09 | bwd_inner_microstep: 5229.63 | bwd_allreduce_microstep: 56.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 22:08:04,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.41 | bwd_microstep: 5238.95 | bwd_inner_microstep: 5151.95 | bwd_allreduce_microstep: 86.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 22:08:13,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.86 | bwd_microstep: 5177.82 | bwd_inner_microstep: 5116.83 | bwd_allreduce_microstep: 60.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2331 [2024-07-31 22:08:21,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3009.56 | bwd_microstep: 4887.86 | bwd_inner_microstep: 4513.29 | bwd_allreduce_microstep: 374.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-31 22:08:30,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.26 | bwd_microstep: 5191.42 | bwd_inner_microstep: 5134.18 | bwd_allreduce_microstep: 57.16 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2102 [2024-07-31 22:08:38,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.62 | bwd_microstep: 5156.48 | bwd_inner_microstep: 4758.46 | bwd_allreduce_microstep: 397.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3124 [2024-07-31 22:08:47,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.26 | bwd_microstep: 5075.63 | bwd_inner_microstep: 4789.32 | bwd_allreduce_microstep: 286.23 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 22:08:56,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 22:08:56,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.97 | bwd_microstep: 4920.06 | bwd_inner_microstep: 4893.79 | bwd_allreduce_microstep: 26.21 | step_microstep: 183.20 [2024-07-31 22:08:56,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28320.94 | bwd: 40934.29 | bwd_inner: 39587.40 | bwd_allreduce: 1346.41 | step: 183.78 82%|████████▏ | 1014/1230 [19:57:02<4:13:25, 70.39s/it] {'loss': 1.1506, 'learning_rate': 1.574546069760514e-06, 'epoch': 0.82} 82%|████████▏ | 1014/1230 [19:57:02<4:13:25, 70.39s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 22:09:05,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3887.64 | bwd_microstep: 5402.03 | bwd_inner_microstep: 5382.88 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-07-31 22:09:14,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.26 | bwd_microstep: 5284.33 | bwd_inner_microstep: 5196.96 | bwd_allreduce_microstep: 87.30 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3801 [2024-07-31 22:09:23,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.17 | bwd_microstep: 5222.28 | bwd_inner_microstep: 5150.11 | bwd_allreduce_microstep: 72.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-31 22:09:32,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.51 | bwd_microstep: 4976.14 | bwd_inner_microstep: 4956.85 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-31 22:09:40,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3478.50 | bwd_microstep: 5139.17 | bwd_inner_microstep: 4739.33 | bwd_allreduce_microstep: 399.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 22:09:49,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.37 | bwd_microstep: 5180.49 | bwd_inner_microstep: 5106.86 | bwd_allreduce_microstep: 73.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-31 22:09:58,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.42 | bwd_microstep: 5026.09 | bwd_inner_microstep: 5002.89 | bwd_allreduce_microstep: 23.13 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-31 22:10:07,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 22:10:07,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.53 | bwd_microstep: 5020.13 | bwd_inner_microstep: 4969.79 | bwd_allreduce_microstep: 50.27 | step_microstep: 181.74 [2024-07-31 22:10:07,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29276.31 | bwd: 41250.64 | bwd_inner: 40505.61 | bwd_allreduce: 744.54 | step: 182.44 83%|████████▎ | 1015/1230 [19:58:13<4:12:44, 70.53s/it] {'loss': 1.1357, 'learning_rate': 1.5603913815003634e-06, 'epoch': 0.83} 83%|████████▎ | 1015/1230 [19:58:13<4:12:44, 70.53s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3927 [2024-07-31 22:10:16,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3805.09 | bwd_microstep: 5174.81 | bwd_inner_microstep: 5155.77 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-31 22:10:25,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.62 | bwd_microstep: 5281.12 | bwd_inner_microstep: 4869.49 | bwd_allreduce_microstep: 411.55 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2813 [2024-07-31 22:10:33,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3102.63 | bwd_microstep: 5118.52 | bwd_inner_microstep: 4723.32 | bwd_allreduce_microstep: 395.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-31 22:10:42,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.10 | bwd_microstep: 5216.26 | bwd_inner_microstep: 5125.41 | bwd_allreduce_microstep: 90.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-07-31 22:10:51,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.00 | bwd_microstep: 5188.07 | bwd_inner_microstep: 4782.56 | bwd_allreduce_microstep: 405.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-31 22:10:59,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.87 | bwd_microstep: 5029.81 | bwd_inner_microstep: 4991.94 | bwd_allreduce_microstep: 37.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-31 22:11:08,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.80 | bwd_microstep: 5085.58 | bwd_inner_microstep: 4690.39 | bwd_allreduce_microstep: 395.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2139 [2024-07-31 22:11:17,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 22:11:17,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.12 | bwd_microstep: 5165.25 | bwd_inner_microstep: 4763.25 | bwd_allreduce_microstep: 401.93 | step_microstep: 181.61 [2024-07-31 22:11:17,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28379.12 | bwd: 41259.40 | bwd_inner: 39102.07 | bwd_allreduce: 2156.84 | step: 182.18 83%|████████▎ | 1016/1230 [19:59:23<4:10:57, 70.36s/it] {'loss': 1.2027, 'learning_rate': 1.5462952181717117e-06, 'epoch': 0.83} 83%|████████▎ | 1016/1230 [19:59:23<4:10:57, 70.36s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 22:11:26,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.81 | bwd_microstep: 5447.96 | bwd_inner_microstep: 5411.04 | bwd_allreduce_microstep: 36.86 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3816 [2024-07-31 22:11:35,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.57 | bwd_microstep: 5217.50 | bwd_inner_microstep: 5151.63 | bwd_allreduce_microstep: 65.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3931 [2024-07-31 22:11:43,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3415.97 | bwd_microstep: 5034.92 | bwd_inner_microstep: 4994.69 | bwd_allreduce_microstep: 40.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 22:11:52,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.61 | bwd_microstep: 5372.85 | bwd_inner_microstep: 5267.88 | bwd_allreduce_microstep: 104.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 22:12:01,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.75 | bwd_microstep: 5130.54 | bwd_inner_microstep: 5055.54 | bwd_allreduce_microstep: 74.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 22:12:10,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.08 | bwd_microstep: 5127.73 | bwd_inner_microstep: 5055.87 | bwd_allreduce_microstep: 71.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 22:12:18,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.10 | bwd_microstep: 5010.90 | bwd_inner_microstep: 4955.87 | bwd_allreduce_microstep: 54.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3689 [2024-07-31 22:12:27,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 22:12:27,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.56 | bwd_microstep: 5029.19 | bwd_inner_microstep: 4962.79 | bwd_allreduce_microstep: 66.33 | step_microstep: 182.43 [2024-07-31 22:12:27,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28724.36 | bwd: 41371.58 | bwd_inner: 40855.25 | bwd_allreduce: 515.86 | step: 183.01 83%|████████▎ | 1017/1230 [20:00:33<4:09:51, 70.38s/it] {'loss': 1.1373, 'learning_rate': 1.5322576775251808e-06, 'epoch': 0.83} 83%|████████▎ | 1017/1230 [20:00:33<4:09:51, 70.38s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2442 [2024-07-31 22:12:36,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3322.21 | bwd_microstep: 5103.50 | bwd_inner_microstep: 4708.93 | bwd_allreduce_microstep: 394.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3825 [2024-07-31 22:12:45,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.37 | bwd_microstep: 5126.47 | bwd_inner_microstep: 5098.80 | bwd_allreduce_microstep: 27.60 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2262 [2024-07-31 22:12:53,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3046.15 | bwd_microstep: 5008.71 | bwd_inner_microstep: 4622.26 | bwd_allreduce_microstep: 386.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-31 22:13:01,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3059.20 | bwd_microstep: 5033.07 | bwd_inner_microstep: 4643.30 | bwd_allreduce_microstep: 389.71 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 22:13:09,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.33 | bwd_microstep: 5069.98 | bwd_inner_microstep: 4678.00 | bwd_allreduce_microstep: 391.91 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 22:13:18,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.73 | bwd_microstep: 5048.52 | bwd_inner_microstep: 5005.21 | bwd_allreduce_microstep: 43.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-31 22:13:26,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3229.38 | bwd_microstep: 4876.09 | bwd_inner_microstep: 4829.57 | bwd_allreduce_microstep: 46.45 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 22:13:35,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-31 22:13:35,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.92 | bwd_microstep: 4911.87 | bwd_inner_microstep: 4889.62 | bwd_allreduce_microstep: 22.18 | step_microstep: 181.77 [2024-07-31 22:13:35,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27379.20 | bwd: 40178.20 | bwd_inner: 38475.63 | bwd_allreduce: 1702.07 | step: 182.47 83%|████████▎ | 1018/1230 [20:01:41<4:06:02, 69.63s/it] {'loss': 1.1319, 'learning_rate': 1.5182788569048712e-06, 'epoch': 0.83} 83%|████████▎ | 1018/1230 [20:01:41<4:06:02, 69.63s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2415 [2024-07-31 22:13:44,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.82 | bwd_microstep: 5399.18 | bwd_inner_microstep: 4982.04 | bwd_allreduce_microstep: 417.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3817 [2024-07-31 22:13:53,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.66 | bwd_microstep: 5285.01 | bwd_inner_microstep: 5196.73 | bwd_allreduce_microstep: 88.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2253 [2024-07-31 22:14:02,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.14 | bwd_microstep: 5205.57 | bwd_inner_microstep: 4800.46 | bwd_allreduce_microstep: 405.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 22:14:10,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3328.94 | bwd_microstep: 4940.60 | bwd_inner_microstep: 4896.15 | bwd_allreduce_microstep: 44.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 22:14:19,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.30 | bwd_microstep: 5067.90 | bwd_inner_microstep: 5027.33 | bwd_allreduce_microstep: 40.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 22:14:28,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.44 | bwd_microstep: 5183.42 | bwd_inner_microstep: 5103.49 | bwd_allreduce_microstep: 79.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-31 22:14:37,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.25 | bwd_microstep: 5106.10 | bwd_inner_microstep: 5040.59 | bwd_allreduce_microstep: 65.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-31 22:14:46,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 22:14:46,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.98 | bwd_microstep: 5182.76 | bwd_inner_microstep: 5128.07 | bwd_allreduce_microstep: 54.62 | step_microstep: 182.41 [2024-07-31 22:14:46,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28736.45 | bwd: 41370.52 | bwd_inner: 40174.79 | bwd_allreduce: 1195.24 | step: 182.99 83%|████████▎ | 1019/1230 [20:02:51<4:05:43, 69.88s/it] {'loss': 1.1618, 'learning_rate': 1.5043588532476806e-06, 'epoch': 0.83} 83%|████████▎ | 1019/1230 [20:02:51<4:05:43, 69.88s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3828 [2024-07-31 22:14:55,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3840.58 | bwd_microstep: 5357.61 | bwd_inner_microstep: 5306.48 | bwd_allreduce_microstep: 51.06 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2796 [2024-07-31 22:15:04,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.34 | bwd_microstep: 5217.74 | bwd_inner_microstep: 4812.38 | bwd_allreduce_microstep: 405.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-31 22:15:12,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.97 | bwd_microstep: 5174.16 | bwd_inner_microstep: 5119.05 | bwd_allreduce_microstep: 55.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-31 22:15:21,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.88 | bwd_microstep: 5172.91 | bwd_inner_microstep: 5117.56 | bwd_allreduce_microstep: 55.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-31 22:15:29,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3011.42 | bwd_microstep: 4878.33 | bwd_inner_microstep: 4503.02 | bwd_allreduce_microstep: 375.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 22:15:38,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.60 | bwd_microstep: 5172.12 | bwd_inner_microstep: 5096.54 | bwd_allreduce_microstep: 75.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2126 [2024-07-31 22:15:47,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.40 | bwd_microstep: 5071.51 | bwd_inner_microstep: 4678.73 | bwd_allreduce_microstep: 392.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3680 [2024-07-31 22:15:55,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 22:15:55,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.44 | bwd_microstep: 4893.70 | bwd_inner_microstep: 4871.34 | bwd_allreduce_microstep: 22.29 | step_microstep: 182.94 [2024-07-31 22:15:55,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28492.54 | bwd: 40938.06 | bwd_inner: 39505.04 | bwd_allreduce: 1432.53 | step: 183.52 83%|████████▎ | 1020/1230 [20:04:01<4:04:26, 69.84s/it] {'loss': 1.1578, 'learning_rate': 1.49049776308265e-06, 'epoch': 0.83} 83%|████████▎ | 1020/1230 [20:04:01<4:04:26, 69.84s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-31 22:16:04,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3359.41 | bwd_microstep: 5154.35 | bwd_inner_microstep: 5085.82 | bwd_allreduce_microstep: 68.47 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 22:16:13,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.85 | bwd_microstep: 5012.39 | bwd_inner_microstep: 4989.02 | bwd_allreduce_microstep: 23.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-07-31 22:16:21,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.77 | bwd_microstep: 5008.13 | bwd_inner_microstep: 4988.80 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2848 [2024-07-31 22:16:30,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.30 | bwd_microstep: 5199.71 | bwd_inner_microstep: 4794.04 | bwd_allreduce_microstep: 405.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-31 22:16:39,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.25 | bwd_microstep: 5180.08 | bwd_inner_microstep: 5096.46 | bwd_allreduce_microstep: 83.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-31 22:16:48,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.07 | bwd_microstep: 5208.20 | bwd_inner_microstep: 4802.69 | bwd_allreduce_microstep: 405.43 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 22:16:56,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3024.49 | bwd_microstep: 4910.37 | bwd_inner_microstep: 4531.39 | bwd_allreduce_microstep: 378.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 22:17:05,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-31 22:17:05,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.94 | bwd_microstep: 4894.33 | bwd_inner_microstep: 4874.96 | bwd_allreduce_microstep: 19.31 | step_microstep: 182.65 [2024-07-31 22:17:05,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28304.97 | bwd: 40567.54 | bwd_inner: 39163.11 | bwd_allreduce: 1403.94 | step: 183.35 83%|████████▎ | 1021/1230 [20:05:10<4:02:37, 69.65s/it] {'loss': 1.1545, 'learning_rate': 1.476695682530268e-06, 'epoch': 0.83} 83%|████████▎ | 1021/1230 [20:05:10<4:02:37, 69.65s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2418 [2024-07-31 22:17:14,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.77 | bwd_microstep: 5538.77 | bwd_inner_microstep: 5111.61 | bwd_allreduce_microstep: 427.09 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3851 [2024-07-31 22:17:22,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.69 | bwd_microstep: 5124.05 | bwd_inner_microstep: 5087.60 | bwd_allreduce_microstep: 36.38 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2065 [2024-07-31 22:17:30,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3007.73 | bwd_microstep: 4986.48 | bwd_inner_microstep: 4606.02 | bwd_allreduce_microstep: 380.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-31 22:17:39,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.55 | bwd_microstep: 5169.94 | bwd_inner_microstep: 4769.29 | bwd_allreduce_microstep: 400.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 22:17:48,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.54 | bwd_microstep: 5115.83 | bwd_inner_microstep: 4719.12 | bwd_allreduce_microstep: 396.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-31 22:17:56,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.52 | bwd_microstep: 4988.85 | bwd_inner_microstep: 4969.40 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 22:18:05,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.02 | bwd_microstep: 5006.28 | bwd_inner_microstep: 4950.24 | bwd_allreduce_microstep: 55.98 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2134 [2024-07-31 22:18:14,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 22:18:14,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.40 | bwd_microstep: 5094.37 | bwd_inner_microstep: 4698.64 | bwd_allreduce_microstep: 395.66 | step_microstep: 181.60 [2024-07-31 22:18:14,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27950.12 | bwd: 41024.56 | bwd_inner: 38911.87 | bwd_allreduce: 2112.21 | step: 182.19 83%|████████▎ | 1022/1230 [20:06:20<4:01:05, 69.55s/it] {'loss': 1.116, 'learning_rate': 1.4629527073018267e-06, 'epoch': 0.83} 83%|████████▎ | 1022/1230 [20:06:20<4:01:05, 69.55s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4037 [2024-07-31 22:18:23,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3866.13 | bwd_microstep: 5366.45 | bwd_inner_microstep: 5347.39 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3863 [2024-07-31 22:18:31,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3279.00 | bwd_microstep: 4916.68 | bwd_inner_microstep: 4897.25 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3792 [2024-07-31 22:18:39,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3248.72 | bwd_microstep: 4896.91 | bwd_inner_microstep: 4868.92 | bwd_allreduce_microstep: 27.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3757 [2024-07-31 22:18:48,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.72 | bwd_microstep: 5031.58 | bwd_inner_microstep: 5007.30 | bwd_allreduce_microstep: 24.21 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3631 [2024-07-31 22:18:56,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3085.81 | bwd_microstep: 4893.17 | bwd_inner_microstep: 4837.73 | bwd_allreduce_microstep: 55.37 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 22:19:05,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.85 | bwd_microstep: 5016.84 | bwd_inner_microstep: 4978.57 | bwd_allreduce_microstep: 38.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-31 22:19:14,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3443.29 | bwd_microstep: 5029.10 | bwd_inner_microstep: 4641.91 | bwd_allreduce_microstep: 387.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3672 [2024-07-31 22:19:22,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 22:19:22,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.73 | bwd_microstep: 5023.15 | bwd_inner_microstep: 4955.90 | bwd_allreduce_microstep: 67.19 | step_microstep: 181.47 [2024-07-31 22:19:22,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27986.16 | bwd: 40173.87 | bwd_inner: 39534.91 | bwd_allreduce: 638.47 | step: 182.05 83%|████████▎ | 1023/1230 [20:07:28<3:58:50, 69.23s/it] {'loss': 1.1662, 'learning_rate': 1.4492689326987408e-06, 'epoch': 0.83} 83%|████████▎ | 1023/1230 [20:07:28<3:58:50, 69.23s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3825 [2024-07-31 22:19:31,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3825.26 | bwd_microstep: 5296.82 | bwd_inner_microstep: 5248.24 | bwd_allreduce_microstep: 48.51 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3074 [2024-07-31 22:19:40,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.67 | bwd_microstep: 5233.25 | bwd_inner_microstep: 4935.45 | bwd_allreduce_microstep: 297.73 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2089 [2024-07-31 22:19:49,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.24 | bwd_microstep: 5249.99 | bwd_inner_microstep: 4842.22 | bwd_allreduce_microstep: 407.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3794 [2024-07-31 22:19:58,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.27 | bwd_microstep: 5159.77 | bwd_inner_microstep: 5111.72 | bwd_allreduce_microstep: 47.98 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 22:20:06,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.11 | bwd_microstep: 4833.29 | bwd_inner_microstep: 4789.09 | bwd_allreduce_microstep: 44.13 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3675 [2024-07-31 22:20:14,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3117.01 | bwd_microstep: 5001.99 | bwd_inner_microstep: 4941.33 | bwd_allreduce_microstep: 60.60 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1146 [2024-07-31 22:20:23,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3435.85 | bwd_microstep: 5072.52 | bwd_inner_microstep: 4679.92 | bwd_allreduce_microstep: 392.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 22:20:31,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 22:20:31,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.17 | bwd_microstep: 5026.70 | bwd_inner_microstep: 4972.86 | bwd_allreduce_microstep: 53.78 | step_microstep: 181.58 [2024-07-31 22:20:31,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27895.49 | bwd: 40874.31 | bwd_inner: 39520.76 | bwd_allreduce: 1353.08 | step: 182.17 83%|████████▎ | 1024/1230 [20:08:37<3:57:33, 69.19s/it] {'loss': 1.1887, 'learning_rate': 1.4356444536119085e-06, 'epoch': 0.83} 83%|████████▎ | 1024/1230 [20:08:37<3:57:33, 69.19s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 22:20:41,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3882.44 | bwd_microstep: 5400.13 | bwd_inner_microstep: 5380.98 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3875 [2024-07-31 22:20:50,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.21 | bwd_microstep: 5394.84 | bwd_inner_microstep: 5339.48 | bwd_allreduce_microstep: 55.29 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-31 22:20:59,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3778.98 | bwd_microstep: 5179.42 | bwd_inner_microstep: 5138.86 | bwd_allreduce_microstep: 40.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-31 22:21:08,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.50 | bwd_microstep: 5131.66 | bwd_inner_microstep: 5080.27 | bwd_allreduce_microstep: 51.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3734 [2024-07-31 22:21:16,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3246.90 | bwd_microstep: 4798.64 | bwd_inner_microstep: 4779.32 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 22:21:24,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3197.98 | bwd_microstep: 4708.42 | bwd_inner_microstep: 4686.45 | bwd_allreduce_microstep: 21.91 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-31 22:21:32,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.33 | bwd_microstep: 4998.06 | bwd_inner_microstep: 4944.45 | bwd_allreduce_microstep: 53.55 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2115 [2024-07-31 22:21:41,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 22:21:41,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.04 | bwd_microstep: 5256.10 | bwd_inner_microstep: 4848.11 | bwd_allreduce_microstep: 407.92 | step_microstep: 181.99 [2024-07-31 22:21:41,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28522.27 | bwd: 40867.26 | bwd_inner: 40197.86 | bwd_allreduce: 668.91 | step: 182.68 83%|████████▎ | 1025/1230 [20:09:47<3:56:56, 69.35s/it] {'loss': 1.111, 'learning_rate': 1.422079364521024e-06, 'epoch': 0.83} 83%|████████▎ | 1025/1230 [20:09:47<3:56:56, 69.35s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3830 [2024-07-31 22:21:50,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.16 | bwd_microstep: 5067.02 | bwd_inner_microstep: 5047.93 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2249 [2024-07-31 22:21:59,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.90 | bwd_microstep: 5206.71 | bwd_inner_microstep: 4801.94 | bwd_allreduce_microstep: 404.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-07-31 22:22:08,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.24 | bwd_microstep: 5273.56 | bwd_inner_microstep: 4866.09 | bwd_allreduce_microstep: 407.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 22:22:16,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.41 | bwd_microstep: 5171.92 | bwd_inner_microstep: 5115.08 | bwd_allreduce_microstep: 56.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 22:22:25,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.99 | bwd_microstep: 5063.07 | bwd_inner_microstep: 4670.61 | bwd_allreduce_microstep: 392.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 22:22:34,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.00 | bwd_microstep: 5004.81 | bwd_inner_microstep: 4953.59 | bwd_allreduce_microstep: 51.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-31 22:22:42,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.17 | bwd_microstep: 4998.62 | bwd_inner_microstep: 4979.21 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 22:22:51,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-31 22:22:51,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.25 | bwd_microstep: 4900.09 | bwd_inner_microstep: 4880.70 | bwd_allreduce_microstep: 19.32 | step_microstep: 182.68 [2024-07-31 22:22:51,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28983.03 | bwd: 40685.77 | bwd_inner: 39315.08 | bwd_allreduce: 1370.20 | step: 183.25 83%|████████▎ | 1026/1230 [20:10:57<3:56:27, 69.55s/it] {'loss': 1.1384, 'learning_rate': 1.4085737594939519e-06, 'epoch': 0.83} 83%|████████▎ | 1026/1230 [20:10:57<3:56:27, 69.55s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3516 [2024-07-31 22:23:00,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.03 | bwd_microstep: 5474.03 | bwd_inner_microstep: 5286.48 | bwd_allreduce_microstep: 187.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-31 22:23:09,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.60 | bwd_microstep: 5239.02 | bwd_inner_microstep: 4833.46 | bwd_allreduce_microstep: 405.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 22:23:18,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.21 | bwd_microstep: 5061.31 | bwd_inner_microstep: 5032.87 | bwd_allreduce_microstep: 28.38 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-31 22:23:27,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.25 | bwd_microstep: 5071.43 | bwd_inner_microstep: 5041.88 | bwd_allreduce_microstep: 29.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 22:23:36,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.82 | bwd_microstep: 4982.13 | bwd_inner_microstep: 4962.81 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-31 22:23:44,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.60 | bwd_microstep: 5106.74 | bwd_inner_microstep: 4710.68 | bwd_allreduce_microstep: 396.00 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3659 [2024-07-31 22:23:53,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.79 | bwd_microstep: 5076.73 | bwd_inner_microstep: 4997.16 | bwd_allreduce_microstep: 79.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 22:24:02,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 22:24:02,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.84 | bwd_microstep: 5004.28 | bwd_inner_microstep: 4946.37 | bwd_allreduce_microstep: 57.84 | step_microstep: 180.85 [2024-07-31 22:24:02,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29197.05 | bwd: 41015.65 | bwd_inner: 39811.66 | bwd_allreduce: 1203.50 | step: 181.44 83%|████████▎ | 1027/1230 [20:12:08<3:56:18, 69.85s/it] {'loss': 1.0839, 'learning_rate': 1.3951277321860468e-06, 'epoch': 0.83} 83%|████████▎ | 1027/1230 [20:12:08<3:56:18, 69.85s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3935 [2024-07-31 22:24:10,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3283.95 | bwd_microstep: 4982.36 | bwd_inner_microstep: 4963.32 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3574 [2024-07-31 22:24:19,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.15 | bwd_microstep: 5166.04 | bwd_inner_microstep: 5077.03 | bwd_allreduce_microstep: 88.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-31 22:24:28,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.30 | bwd_microstep: 5024.18 | bwd_inner_microstep: 5003.04 | bwd_allreduce_microstep: 21.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-31 22:24:36,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.23 | bwd_microstep: 5142.82 | bwd_inner_microstep: 5086.91 | bwd_allreduce_microstep: 55.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-31 22:24:45,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3306.10 | bwd_microstep: 4889.21 | bwd_inner_microstep: 4851.36 | bwd_allreduce_microstep: 37.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 22:24:53,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.15 | bwd_microstep: 5123.31 | bwd_inner_microstep: 4724.06 | bwd_allreduce_microstep: 399.18 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 22:25:02,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.29 | bwd_microstep: 4923.51 | bwd_inner_microstep: 4900.99 | bwd_allreduce_microstep: 22.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 22:25:11,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 22:25:11,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.09 | bwd_microstep: 5051.48 | bwd_inner_microstep: 4993.07 | bwd_allreduce_microstep: 58.33 | step_microstep: 181.44 [2024-07-31 22:25:11,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28385.17 | bwd: 40302.88 | bwd_inner: 39599.71 | bwd_allreduce: 702.68 | step: 182.03 84%|████████▎ | 1028/1230 [20:13:17<3:54:18, 69.60s/it] {'loss': 1.167, 'learning_rate': 1.381741375839537e-06, 'epoch': 0.84} 84%|████████▎ | 1028/1230 [20:13:17<3:54:18, 69.60s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3782 [2024-07-31 22:25:20,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.98 | bwd_microstep: 5165.56 | bwd_inner_microstep: 5119.67 | bwd_allreduce_microstep: 45.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2332 [2024-07-31 22:25:28,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.78 | bwd_microstep: 5224.43 | bwd_inner_microstep: 4818.69 | bwd_allreduce_microstep: 405.68 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 22:25:37,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.20 | bwd_microstep: 5025.25 | bwd_inner_microstep: 4986.12 | bwd_allreduce_microstep: 39.06 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2077 [2024-07-31 22:25:46,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.27 | bwd_microstep: 5206.07 | bwd_inner_microstep: 4803.84 | bwd_allreduce_microstep: 402.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3650 [2024-07-31 22:25:55,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.88 | bwd_microstep: 5117.30 | bwd_inner_microstep: 5033.09 | bwd_allreduce_microstep: 84.14 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 22:26:03,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.10 | bwd_microstep: 5177.15 | bwd_inner_microstep: 5099.91 | bwd_allreduce_microstep: 77.17 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 22:26:12,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.39 | bwd_microstep: 4910.81 | bwd_inner_microstep: 4891.49 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2154 [2024-07-31 22:26:21,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 22:26:21,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.69 | bwd_microstep: 5089.26 | bwd_inner_microstep: 4693.95 | bwd_allreduce_microstep: 395.23 | step_microstep: 182.32 [2024-07-31 22:26:21,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28806.20 | bwd: 40915.80 | bwd_inner: 39446.70 | bwd_allreduce: 1468.62 | step: 183.00 84%|████████▎ | 1029/1230 [20:14:27<3:53:36, 69.73s/it] {'loss': 1.1495, 'learning_rate': 1.3684147832828409e-06, 'epoch': 0.84} 84%|████████▎ | 1029/1230 [20:14:27<3:53:36, 69.73s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3914 [2024-07-31 22:26:30,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.18 | bwd_microstep: 5391.55 | bwd_inner_microstep: 5326.38 | bwd_allreduce_microstep: 65.10 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2808 [2024-07-31 22:26:39,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.87 | bwd_microstep: 5265.46 | bwd_inner_microstep: 4857.06 | bwd_allreduce_microstep: 408.34 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2277 [2024-07-31 22:26:47,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.02 | bwd_microstep: 5174.82 | bwd_inner_microstep: 4772.03 | bwd_allreduce_microstep: 402.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-31 22:26:56,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3249.02 | bwd_microstep: 4955.52 | bwd_inner_microstep: 4906.00 | bwd_allreduce_microstep: 49.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3784 [2024-07-31 22:27:04,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.96 | bwd_microstep: 5018.59 | bwd_inner_microstep: 4999.16 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 22:27:13,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.06 | bwd_microstep: 5016.54 | bwd_inner_microstep: 4964.94 | bwd_allreduce_microstep: 51.54 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2157 [2024-07-31 22:27:22,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3445.37 | bwd_microstep: 5035.98 | bwd_inner_microstep: 4646.32 | bwd_allreduce_microstep: 389.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 22:27:30,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 22:27:30,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.81 | bwd_microstep: 5131.52 | bwd_inner_microstep: 5064.99 | bwd_allreduce_microstep: 66.47 | step_microstep: 182.77 [2024-07-31 22:27:30,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28338.20 | bwd: 40989.98 | bwd_inner: 39536.83 | bwd_allreduce: 1452.68 | step: 183.36 84%|████████▎ | 1030/1230 [20:15:36<3:52:22, 69.71s/it] {'loss': 1.176, 'learning_rate': 1.3551480469299538e-06, 'epoch': 0.84} 84%|████████▎ | 1030/1230 [20:15:36<3:52:22, 69.71s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4064 [2024-07-31 22:27:40,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3855.08 | bwd_microstep: 5365.79 | bwd_inner_microstep: 5346.71 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3863 [2024-07-31 22:27:49,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.32 | bwd_microstep: 5117.56 | bwd_inner_microstep: 5098.18 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2243 [2024-07-31 22:27:57,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.49 | bwd_microstep: 5260.19 | bwd_inner_microstep: 4854.30 | bwd_allreduce_microstep: 405.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-31 22:28:06,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.40 | bwd_microstep: 5102.35 | bwd_inner_microstep: 5071.08 | bwd_allreduce_microstep: 31.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-31 22:28:15,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.06 | bwd_microstep: 5120.89 | bwd_inner_microstep: 4722.97 | bwd_allreduce_microstep: 397.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 22:28:24,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.61 | bwd_microstep: 4891.10 | bwd_inner_microstep: 4870.71 | bwd_allreduce_microstep: 20.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-31 22:28:32,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.54 | bwd_microstep: 5121.73 | bwd_inner_microstep: 5051.76 | bwd_allreduce_microstep: 69.90 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3697 [2024-07-31 22:28:41,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 22:28:41,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.08 | bwd_microstep: 5004.34 | bwd_inner_microstep: 4937.10 | bwd_allreduce_microstep: 67.17 | step_microstep: 181.78 [2024-07-31 22:28:41,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29312.47 | bwd: 40983.94 | bwd_inner: 39952.76 | bwd_allreduce: 1030.69 | step: 182.35 84%|████████▍ | 1031/1230 [20:16:47<3:52:07, 69.99s/it] {'loss': 1.0655, 'learning_rate': 1.3419412587797887e-06, 'epoch': 0.84} 84%|████████▍ | 1031/1230 [20:16:47<3:52:07, 69.99s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-31 22:28:50,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.55 | bwd_microstep: 5355.18 | bwd_inner_microstep: 5257.66 | bwd_allreduce_microstep: 97.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3825 [2024-07-31 22:28:59,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.93 | bwd_microstep: 5125.64 | bwd_inner_microstep: 5085.34 | bwd_allreduce_microstep: 40.23 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2208 [2024-07-31 22:29:08,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.96 | bwd_microstep: 5265.23 | bwd_inner_microstep: 4856.51 | bwd_allreduce_microstep: 408.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-31 22:29:16,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.16 | bwd_microstep: 4990.82 | bwd_inner_microstep: 4971.43 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2104 [2024-07-31 22:29:25,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.47 | bwd_microstep: 5271.03 | bwd_inner_microstep: 4861.59 | bwd_allreduce_microstep: 409.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 22:29:34,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.30 | bwd_microstep: 5133.47 | bwd_inner_microstep: 5060.26 | bwd_allreduce_microstep: 73.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3668 [2024-07-31 22:29:43,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.07 | bwd_microstep: 5060.80 | bwd_inner_microstep: 5017.64 | bwd_allreduce_microstep: 43.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 22:29:51,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 22:29:51,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.11 | bwd_microstep: 4927.65 | bwd_inner_microstep: 4546.92 | bwd_allreduce_microstep: 380.65 | step_microstep: 181.53 [2024-07-31 22:29:51,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28631.45 | bwd: 41129.80 | bwd_inner: 39657.29 | bwd_allreduce: 1472.02 | step: 182.11 84%|████████▍ | 1032/1230 [20:17:57<3:51:03, 70.02s/it] {'loss': 1.1679, 'learning_rate': 1.3287945104155508e-06, 'epoch': 0.84} 84%|████████▍ | 1032/1230 [20:17:57<3:51:03, 70.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3824 [2024-07-31 22:30:01,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.74 | bwd_microstep: 5608.60 | bwd_inner_microstep: 5503.31 | bwd_allreduce_microstep: 105.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-31 22:30:09,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3342.91 | bwd_microstep: 5109.38 | bwd_inner_microstep: 5041.25 | bwd_allreduce_microstep: 68.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2283 [2024-07-31 22:30:17,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3280.93 | bwd_microstep: 5102.13 | bwd_inner_microstep: 4706.33 | bwd_allreduce_microstep: 395.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3576 [2024-07-31 22:30:26,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.27 | bwd_microstep: 5103.43 | bwd_inner_microstep: 5027.96 | bwd_allreduce_microstep: 75.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3640 [2024-07-31 22:30:35,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.59 | bwd_microstep: 5173.51 | bwd_inner_microstep: 5079.68 | bwd_allreduce_microstep: 93.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3704 [2024-07-31 22:30:44,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.91 | bwd_microstep: 5144.77 | bwd_inner_microstep: 5060.08 | bwd_allreduce_microstep: 84.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 22:30:52,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.80 | bwd_microstep: 4997.02 | bwd_inner_microstep: 4947.19 | bwd_allreduce_microstep: 49.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 22:31:01,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 22:31:01,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.71 | bwd_microstep: 5104.00 | bwd_inner_microstep: 5033.92 | bwd_allreduce_microstep: 70.01 | step_microstep: 181.70 [2024-07-31 22:31:01,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28260.76 | bwd: 41342.82 | bwd_inner: 40399.68 | bwd_allreduce: 942.66 | step: 182.28 84%|████████▍ | 1033/1230 [20:19:07<3:49:48, 69.99s/it] {'loss': 1.1448, 'learning_rate': 1.3157078930040856e-06, 'epoch': 0.84} 84%|████████▍ | 1033/1230 [20:19:07<3:49:48, 69.99s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3939 [2024-07-31 22:31:10,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.04 | bwd_microstep: 5478.96 | bwd_inner_microstep: 5404.39 | bwd_allreduce_microstep: 74.50 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4007 [2024-07-31 22:31:19,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3848.92 | bwd_microstep: 5259.29 | bwd_inner_microstep: 5239.94 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3738 [2024-07-31 22:31:28,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.57 | bwd_microstep: 5178.85 | bwd_inner_microstep: 5134.94 | bwd_allreduce_microstep: 43.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-07-31 22:31:37,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.61 | bwd_microstep: 5175.88 | bwd_inner_microstep: 5125.26 | bwd_allreduce_microstep: 50.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3775 [2024-07-31 22:31:46,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.55 | bwd_microstep: 5211.55 | bwd_inner_microstep: 5154.60 | bwd_allreduce_microstep: 56.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 22:31:55,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.33 | bwd_microstep: 5167.77 | bwd_inner_microstep: 5096.76 | bwd_allreduce_microstep: 70.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-31 22:32:03,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.17 | bwd_microstep: 5004.33 | bwd_inner_microstep: 4947.97 | bwd_allreduce_microstep: 56.29 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 22:32:12,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 22:32:12,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.03 | bwd_microstep: 5125.15 | bwd_inner_microstep: 4727.45 | bwd_allreduce_microstep: 397.63 | step_microstep: 183.24 [2024-07-31 22:32:12,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29185.11 | bwd: 41601.76 | bwd_inner: 40831.25 | bwd_allreduce: 770.02 | step: 183.84 84%|████████▍ | 1034/1230 [20:20:18<3:49:45, 70.33s/it] {'loss': 1.1494, 'learning_rate': 1.3026814972952674e-06, 'epoch': 0.84} 84%|████████▍ | 1034/1230 [20:20:18<3:49:45, 70.33s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3961 [2024-07-31 22:32:21,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.33 | bwd_microstep: 5525.05 | bwd_inner_microstep: 5439.17 | bwd_allreduce_microstep: 85.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3904 [2024-07-31 22:32:30,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.75 | bwd_microstep: 5241.28 | bwd_inner_microstep: 5187.85 | bwd_allreduce_microstep: 53.37 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-07-31 22:32:39,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.82 | bwd_microstep: 5037.23 | bwd_inner_microstep: 5016.65 | bwd_allreduce_microstep: 20.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 22:32:47,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3253.20 | bwd_microstep: 4868.51 | bwd_inner_microstep: 4842.74 | bwd_allreduce_microstep: 25.69 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3625 [2024-07-31 22:32:56,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.11 | bwd_microstep: 5009.59 | bwd_inner_microstep: 4937.17 | bwd_allreduce_microstep: 72.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 22:33:05,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.79 | bwd_microstep: 4962.64 | bwd_inner_microstep: 4904.33 | bwd_allreduce_microstep: 58.25 | step_microstep: 0.09 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1154 [2024-07-31 22:33:13,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.43 | bwd_microstep: 5169.21 | bwd_inner_microstep: 4770.29 | bwd_allreduce_microstep: 398.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 22:33:22,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 22:33:22,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.27 | bwd_microstep: 5069.49 | bwd_inner_microstep: 5009.18 | bwd_allreduce_microstep: 60.24 | step_microstep: 181.98 [2024-07-31 22:33:22,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28735.63 | bwd: 40882.98 | bwd_inner: 40107.32 | bwd_allreduce: 775.18 | step: 182.69 84%|████████▍ | 1035/1230 [20:21:28<3:48:12, 70.22s/it] {'loss': 1.121, 'learning_rate': 1.2897154136213542e-06, 'epoch': 0.84} 84%|████████▍ | 1035/1230 [20:21:28<3:48:12, 70.22s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3867 [2024-07-31 22:33:31,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.13 | bwd_microstep: 5467.93 | bwd_inner_microstep: 5381.80 | bwd_allreduce_microstep: 86.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 22:33:40,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.63 | bwd_microstep: 5022.82 | bwd_inner_microstep: 4999.21 | bwd_allreduce_microstep: 23.53 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2244 [2024-07-31 22:33:49,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.85 | bwd_microstep: 5083.14 | bwd_inner_microstep: 4688.24 | bwd_allreduce_microstep: 394.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 22:33:57,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3266.89 | bwd_microstep: 4821.67 | bwd_inner_microstep: 4796.05 | bwd_allreduce_microstep: 25.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 22:34:06,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.22 | bwd_microstep: 5237.29 | bwd_inner_microstep: 4831.96 | bwd_allreduce_microstep: 405.26 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2113 [2024-07-31 22:34:14,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.53 | bwd_microstep: 5147.30 | bwd_inner_microstep: 4751.47 | bwd_allreduce_microstep: 395.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-31 22:34:23,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.57 | bwd_microstep: 4993.09 | bwd_inner_microstep: 4973.74 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-31 22:34:32,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 22:34:32,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.44 | bwd_microstep: 4997.84 | bwd_inner_microstep: 4948.29 | bwd_allreduce_microstep: 49.48 | step_microstep: 181.69 [2024-07-31 22:34:32,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28558.16 | bwd: 40771.05 | bwd_inner: 39370.71 | bwd_allreduce: 1399.85 | step: 182.26 84%|████████▍ | 1036/1230 [20:22:38<3:46:29, 70.05s/it] {'loss': 1.1679, 'learning_rate': 1.2768097318963701e-06, 'epoch': 0.84} 84%|████████▍ | 1036/1230 [20:22:38<3:46:29, 70.05s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3981 [2024-07-31 22:34:41,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.33 | bwd_microstep: 5345.22 | bwd_inner_microstep: 5301.42 | bwd_allreduce_microstep: 43.74 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3813 [2024-07-31 22:34:49,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.98 | bwd_microstep: 4941.68 | bwd_inner_microstep: 4922.39 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-31 22:34:58,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3355.85 | bwd_microstep: 5028.19 | bwd_inner_microstep: 4973.03 | bwd_allreduce_microstep: 55.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-31 22:35:07,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.00 | bwd_microstep: 5145.87 | bwd_inner_microstep: 5064.30 | bwd_allreduce_microstep: 81.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-31 22:35:15,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.06 | bwd_microstep: 5112.85 | bwd_inner_microstep: 4716.11 | bwd_allreduce_microstep: 396.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-31 22:35:24,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.73 | bwd_microstep: 5016.66 | bwd_inner_microstep: 4980.73 | bwd_allreduce_microstep: 35.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 22:35:32,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3243.44 | bwd_microstep: 4805.45 | bwd_inner_microstep: 4784.87 | bwd_allreduce_microstep: 20.48 | step_microstep: 0.13 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 22:35:41,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 22:35:41,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.66 | bwd_microstep: 4928.03 | bwd_inner_microstep: 4899.07 | bwd_allreduce_microstep: 28.90 | step_microstep: 181.63 [2024-07-31 22:35:41,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28106.96 | bwd: 40323.93 | bwd_inner: 39641.86 | bwd_allreduce: 681.56 | step: 182.25 84%|████████▍ | 1037/1230 [20:23:46<3:44:05, 69.67s/it] {'loss': 1.1609, 'learning_rate': 1.2639645416154744e-06, 'epoch': 0.84} 84%|████████▍ | 1037/1230 [20:23:46<3:44:05, 69.67s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4006 [2024-07-31 22:35:50,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3855.28 | bwd_microstep: 5271.12 | bwd_inner_microstep: 5251.98 | bwd_allreduce_microstep: 19.07 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3779 [2024-07-31 22:35:58,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.85 | bwd_microstep: 4981.84 | bwd_inner_microstep: 4940.36 | bwd_allreduce_microstep: 41.42 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3050 [2024-07-31 22:36:07,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.24 | bwd_microstep: 5153.46 | bwd_inner_microstep: 4864.24 | bwd_allreduce_microstep: 289.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-31 22:36:16,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3341.73 | bwd_microstep: 5211.81 | bwd_inner_microstep: 5131.96 | bwd_allreduce_microstep: 79.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 22:36:24,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3243.48 | bwd_microstep: 4835.35 | bwd_inner_microstep: 4811.48 | bwd_allreduce_microstep: 23.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-31 22:36:32,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2995.13 | bwd_microstep: 4847.31 | bwd_inner_microstep: 4471.52 | bwd_allreduce_microstep: 375.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3677 [2024-07-31 22:36:40,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.74 | bwd_microstep: 4970.76 | bwd_inner_microstep: 4951.44 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-31 22:36:49,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 22:36:49,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.30 | bwd_microstep: 4888.40 | bwd_inner_microstep: 4869.03 | bwd_allreduce_microstep: 19.30 | step_microstep: 181.55 [2024-07-31 22:36:49,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27929.66 | bwd: 40160.04 | bwd_inner: 39291.95 | bwd_allreduce: 867.60 | step: 182.13 84%|████████▍ | 1038/1230 [20:24:55<3:41:44, 69.29s/it] {'loss': 1.1445, 'learning_rate': 1.2511799318543493e-06, 'epoch': 0.84} 84%|████████▍ | 1038/1230 [20:24:55<3:41:44, 69.29s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4037 [2024-07-31 22:36:58,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.42 | bwd_microstep: 5222.63 | bwd_inner_microstep: 5203.60 | bwd_allreduce_microstep: 18.96 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2322 [2024-07-31 22:37:07,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.81 | bwd_microstep: 5169.27 | bwd_inner_microstep: 4764.67 | bwd_allreduce_microstep: 404.52 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2276 [2024-07-31 22:37:16,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.43 | bwd_microstep: 5467.25 | bwd_inner_microstep: 5044.76 | bwd_allreduce_microstep: 422.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3834 [2024-07-31 22:37:25,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.60 | bwd_microstep: 5045.15 | bwd_inner_microstep: 5025.88 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 22:37:33,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.89 | bwd_microstep: 4986.29 | bwd_inner_microstep: 4966.91 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2117 [2024-07-31 22:37:42,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.70 | bwd_microstep: 5202.98 | bwd_inner_microstep: 4801.21 | bwd_allreduce_microstep: 401.71 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 22:37:51,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.40 | bwd_microstep: 5043.22 | bwd_inner_microstep: 4985.39 | bwd_allreduce_microstep: 57.77 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 639 [2024-07-31 22:38:00,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 22:38:00,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3437.80 | bwd_microstep: 5141.00 | bwd_inner_microstep: 4744.23 | bwd_allreduce_microstep: 396.71 | step_microstep: 181.78 [2024-07-31 22:38:00,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28951.96 | bwd: 41277.77 | bwd_inner: 39536.59 | bwd_allreduce: 1740.68 | step: 182.37 84%|████████▍ | 1039/1230 [20:26:05<3:41:47, 69.67s/it] {'loss': 1.1041, 'learning_rate': 1.2384559912685768e-06, 'epoch': 0.84} 84%|████████▍ | 1039/1230 [20:26:05<3:41:47, 69.67s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3843 [2024-07-31 22:38:09,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3940.55 | bwd_microstep: 5081.90 | bwd_inner_microstep: 5009.44 | bwd_allreduce_microstep: 72.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3756 [2024-07-31 22:38:18,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3935.22 | bwd_microstep: 5155.92 | bwd_inner_microstep: 5103.21 | bwd_allreduce_microstep: 52.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3775 [2024-07-31 22:38:27,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.46 | bwd_microstep: 5159.75 | bwd_inner_microstep: 5107.88 | bwd_allreduce_microstep: 51.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2245 [2024-07-31 22:38:35,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.80 | bwd_microstep: 5046.99 | bwd_inner_microstep: 4655.38 | bwd_allreduce_microstep: 391.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 22:38:44,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.05 | bwd_microstep: 5050.15 | bwd_inner_microstep: 5008.40 | bwd_allreduce_microstep: 41.68 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3681 [2024-07-31 22:38:53,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.87 | bwd_microstep: 5040.86 | bwd_inner_microstep: 4967.53 | bwd_allreduce_microstep: 73.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 22:39:01,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.41 | bwd_microstep: 4997.63 | bwd_inner_microstep: 4978.22 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2116 [2024-07-31 22:39:09,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 22:39:09,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3024.57 | bwd_microstep: 4918.99 | bwd_inner_microstep: 4542.87 | bwd_allreduce_microstep: 376.05 | step_microstep: 181.92 [2024-07-31 22:39:09,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29044.83 | bwd: 40452.17 | bwd_inner: 39372.86 | bwd_allreduce: 1078.82 | step: 182.51 85%|████████▍ | 1040/1230 [20:27:15<3:40:46, 69.72s/it] {'loss': 1.1081, 'learning_rate': 1.225792808093025e-06, 'epoch': 0.85} 85%|████████▍ | 1040/1230 [20:27:15<3:40:46, 69.72s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3882 [2024-07-31 22:39:19,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.07 | bwd_microstep: 5586.57 | bwd_inner_microstep: 5483.69 | bwd_allreduce_microstep: 102.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-31 22:39:27,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3167.18 | bwd_microstep: 4651.30 | bwd_inner_microstep: 4632.00 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2226 [2024-07-31 22:39:35,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3043.56 | bwd_microstep: 5048.77 | bwd_inner_microstep: 4655.94 | bwd_allreduce_microstep: 392.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-31 22:39:44,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.42 | bwd_microstep: 5042.90 | bwd_inner_microstep: 5018.13 | bwd_allreduce_microstep: 24.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 22:39:52,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.13 | bwd_microstep: 5076.71 | bwd_inner_microstep: 5025.96 | bwd_allreduce_microstep: 50.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-31 22:40:01,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.77 | bwd_microstep: 5043.44 | bwd_inner_microstep: 4651.93 | bwd_allreduce_microstep: 391.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 22:40:10,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.15 | bwd_microstep: 5073.58 | bwd_inner_microstep: 5003.90 | bwd_allreduce_microstep: 69.61 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2106 [2024-07-31 22:40:18,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-31 22:40:18,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.55 | bwd_microstep: 5067.42 | bwd_inner_microstep: 4675.72 | bwd_allreduce_microstep: 391.63 | step_microstep: 181.82 [2024-07-31 22:40:18,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28078.74 | bwd: 40590.68 | bwd_inner: 39147.21 | bwd_allreduce: 1442.99 | step: 182.41 85%|████████▍ | 1041/1230 [20:28:24<3:38:56, 69.50s/it] {'loss': 1.1605, 'learning_rate': 1.2131904701412322e-06, 'epoch': 0.85} 85%|████████▍ | 1041/1230 [20:28:24<3:38:56, 69.50s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3879 [2024-07-31 22:40:27,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3833.05 | bwd_microstep: 5237.42 | bwd_inner_microstep: 5204.13 | bwd_allreduce_microstep: 33.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2285 [2024-07-31 22:40:36,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3371.94 | bwd_microstep: 5278.08 | bwd_inner_microstep: 4870.04 | bwd_allreduce_microstep: 407.98 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3764 [2024-07-31 22:40:45,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.97 | bwd_microstep: 5120.84 | bwd_inner_microstep: 5054.51 | bwd_allreduce_microstep: 66.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 22:40:54,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.80 | bwd_microstep: 5191.04 | bwd_inner_microstep: 5109.23 | bwd_allreduce_microstep: 81.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3779 [2024-07-31 22:41:03,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.15 | bwd_microstep: 5044.70 | bwd_inner_microstep: 5025.36 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2091 [2024-07-31 22:41:10,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3018.76 | bwd_microstep: 4881.75 | bwd_inner_microstep: 4507.17 | bwd_allreduce_microstep: 374.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-31 22:41:19,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.69 | bwd_microstep: 4864.05 | bwd_inner_microstep: 4844.61 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3673 [2024-07-31 22:41:28,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.74 [2024-07-31 22:41:28,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.29 | bwd_microstep: 4867.25 | bwd_inner_microstep: 4847.86 | bwd_allreduce_microstep: 19.32 | step_microstep: 182.98 [2024-07-31 22:41:28,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28539.56 | bwd: 40485.12 | bwd_inner: 39462.87 | bwd_allreduce: 1021.76 | step: 183.56 85%|████████▍ | 1042/1230 [20:29:34<3:37:38, 69.46s/it] {'loss': 1.1238, 'learning_rate': 1.2006490648048118e-06, 'epoch': 0.85} 85%|████████▍ | 1042/1230 [20:29:34<3:37:38, 69.46s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2353 [2024-07-31 22:41:37,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3933.84 | bwd_microstep: 5448.24 | bwd_inner_microstep: 5027.14 | bwd_allreduce_microstep: 421.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3815 [2024-07-31 22:41:46,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.28 | bwd_microstep: 5148.87 | bwd_inner_microstep: 5116.84 | bwd_allreduce_microstep: 31.97 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3762 [2024-07-31 22:41:55,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.70 | bwd_microstep: 5111.31 | bwd_inner_microstep: 5071.14 | bwd_allreduce_microstep: 40.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-31 22:42:04,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.24 | bwd_microstep: 5078.78 | bwd_inner_microstep: 5015.45 | bwd_allreduce_microstep: 63.26 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-31 22:42:12,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.54 | bwd_microstep: 5160.75 | bwd_inner_microstep: 5086.89 | bwd_allreduce_microstep: 73.79 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-31 22:42:21,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.23 | bwd_microstep: 4992.99 | bwd_inner_microstep: 4973.66 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 22:42:30,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.10 | bwd_microstep: 5055.02 | bwd_inner_microstep: 5030.77 | bwd_allreduce_microstep: 24.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-31 22:42:39,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 22:42:39,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.61 | bwd_microstep: 5090.84 | bwd_inner_microstep: 4695.73 | bwd_allreduce_microstep: 395.04 | step_microstep: 181.77 [2024-07-31 22:42:39,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29504.46 | bwd: 41086.78 | bwd_inner: 40017.56 | bwd_allreduce: 1068.73 | step: 182.38 85%|████████▍ | 1043/1230 [20:30:45<3:37:51, 69.90s/it] {'loss': 1.1187, 'learning_rate': 1.1881686790528279e-06, 'epoch': 0.85} 85%|████████▍ | 1043/1230 [20:30:45<3:37:51, 69.90s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3962 [2024-07-31 22:42:48,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3803.23 | bwd_microstep: 5215.68 | bwd_inner_microstep: 5196.61 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2320 [2024-07-31 22:42:57,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.24 | bwd_microstep: 5250.12 | bwd_inner_microstep: 4842.60 | bwd_allreduce_microstep: 407.45 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2280 [2024-07-31 22:43:06,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.72 | bwd_microstep: 5377.48 | bwd_inner_microstep: 4961.40 | bwd_allreduce_microstep: 416.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2253 [2024-07-31 22:43:14,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.38 | bwd_microstep: 5218.64 | bwd_inner_microstep: 4811.25 | bwd_allreduce_microstep: 407.32 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-31 22:43:23,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.91 | bwd_microstep: 4971.57 | bwd_inner_microstep: 4952.24 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2161 [2024-07-31 22:43:32,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.35 | bwd_microstep: 5126.46 | bwd_inner_microstep: 4729.68 | bwd_allreduce_microstep: 396.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 22:43:41,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.36 | bwd_microstep: 5174.56 | bwd_inner_microstep: 5099.60 | bwd_allreduce_microstep: 74.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3658 [2024-07-31 22:43:50,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 22:43:50,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.19 | bwd_microstep: 5187.22 | bwd_inner_microstep: 5094.79 | bwd_allreduce_microstep: 92.36 | step_microstep: 182.09 [2024-07-31 22:43:50,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28978.28 | bwd: 41521.71 | bwd_inner: 39688.12 | bwd_allreduce: 1833.11 | step: 182.80 85%|████████▍ | 1044/1230 [20:31:55<3:37:33, 70.18s/it] {'loss': 1.0732, 'learning_rate': 1.1757493994312052e-06, 'epoch': 0.85} 85%|████████▍ | 1044/1230 [20:31:55<3:37:33, 70.18s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3898 [2024-07-31 22:43:59,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3836.40 | bwd_microstep: 5191.61 | bwd_inner_microstep: 5159.06 | bwd_allreduce_microstep: 32.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3807 [2024-07-31 22:44:07,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3279.15 | bwd_microstep: 4900.25 | bwd_inner_microstep: 4874.32 | bwd_allreduce_microstep: 25.86 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2066 [2024-07-31 22:44:16,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.23 | bwd_microstep: 5254.46 | bwd_inner_microstep: 4848.38 | bwd_allreduce_microstep: 406.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-31 22:44:24,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.12 | bwd_microstep: 5171.22 | bwd_inner_microstep: 5115.53 | bwd_allreduce_microstep: 55.62 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2182 [2024-07-31 22:44:32,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3049.84 | bwd_microstep: 4981.02 | bwd_inner_microstep: 4596.79 | bwd_allreduce_microstep: 384.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-31 22:44:41,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.06 | bwd_microstep: 4985.82 | bwd_inner_microstep: 4934.43 | bwd_allreduce_microstep: 51.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-31 22:44:50,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.94 | bwd_microstep: 5257.83 | bwd_inner_microstep: 4851.51 | bwd_allreduce_microstep: 406.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-31 22:44:58,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 22:44:58,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.13 | bwd_microstep: 4787.28 | bwd_inner_microstep: 4767.86 | bwd_allreduce_microstep: 19.35 | step_microstep: 181.57 [2024-07-31 22:44:58,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27661.77 | bwd: 40529.47 | bwd_inner: 39147.82 | bwd_allreduce: 1381.16 | step: 182.15 85%|████████▍ | 1045/1230 [20:33:04<3:34:50, 69.68s/it] {'loss': 1.1456, 'learning_rate': 1.1633913120621188e-06, 'epoch': 0.85} 85%|████████▍ | 1045/1230 [20:33:04<3:34:50, 69.68s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3879 [2024-07-31 22:45:07,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.37 | bwd_microstep: 4983.09 | bwd_inner_microstep: 4958.09 | bwd_allreduce_microstep: 24.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3776 [2024-07-31 22:45:16,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.67 | bwd_microstep: 5001.90 | bwd_inner_microstep: 4982.50 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3604 [2024-07-31 22:45:24,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.94 | bwd_microstep: 5190.89 | bwd_inner_microstep: 5086.26 | bwd_allreduce_microstep: 104.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-31 22:45:33,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.90 | bwd_microstep: 5063.17 | bwd_inner_microstep: 5022.61 | bwd_allreduce_microstep: 40.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-31 22:45:42,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.54 | bwd_microstep: 5065.70 | bwd_inner_microstep: 4672.69 | bwd_allreduce_microstep: 392.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 22:45:50,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.43 | bwd_microstep: 5047.49 | bwd_inner_microstep: 4990.26 | bwd_allreduce_microstep: 57.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-31 22:45:59,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.79 | bwd_microstep: 5145.02 | bwd_inner_microstep: 4746.90 | bwd_allreduce_microstep: 398.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-31 22:46:08,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 22:46:08,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.55 | bwd_microstep: 5061.85 | bwd_inner_microstep: 5003.18 | bwd_allreduce_microstep: 58.60 | step_microstep: 182.59 [2024-07-31 22:46:08,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28810.10 | bwd: 40559.10 | bwd_inner: 39462.44 | bwd_allreduce: 1096.18 | step: 183.16 85%|████████▌ | 1046/1230 [20:34:14<3:33:42, 69.69s/it] {'loss': 1.1755, 'learning_rate': 1.151094502643414e-06, 'epoch': 0.85} 85%|████████▌ | 1046/1230 [20:34:14<3:33:42, 69.69s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3576 [2024-07-31 22:46:17,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.11 | bwd_microstep: 5358.13 | bwd_inner_microstep: 5217.76 | bwd_allreduce_microstep: 140.30 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3567 [2024-07-31 22:46:25,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3385.54 | bwd_microstep: 5079.12 | bwd_inner_microstep: 5008.51 | bwd_allreduce_microstep: 70.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2215 [2024-07-31 22:46:34,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.85 | bwd_microstep: 5321.35 | bwd_inner_microstep: 4909.70 | bwd_allreduce_microstep: 411.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 22:46:43,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.98 | bwd_microstep: 5200.04 | bwd_inner_microstep: 5116.42 | bwd_allreduce_microstep: 83.55 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2100 [2024-07-31 22:46:52,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3465.79 | bwd_microstep: 5040.45 | bwd_inner_microstep: 4652.29 | bwd_allreduce_microstep: 388.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 22:47:00,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.35 | bwd_microstep: 4978.08 | bwd_inner_microstep: 4958.68 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-31 22:47:09,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.06 | bwd_microstep: 4937.31 | bwd_inner_microstep: 4912.79 | bwd_allreduce_microstep: 24.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-31 22:47:18,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 22:47:18,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.97 | bwd_microstep: 5017.78 | bwd_inner_microstep: 4965.36 | bwd_allreduce_microstep: 52.35 | step_microstep: 181.53 [2024-07-31 22:47:18,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28710.58 | bwd: 40932.23 | bwd_inner: 39741.44 | bwd_allreduce: 1190.30 | step: 182.11 85%|████████▌ | 1047/1230 [20:35:24<3:32:48, 69.77s/it] {'loss': 1.1334, 'learning_rate': 1.1388590564479895e-06, 'epoch': 0.85} 85%|████████▌ | 1047/1230 [20:35:24<3:32:48, 69.77s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3901 [2024-07-31 22:47:27,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3821.95 | bwd_microstep: 5260.44 | bwd_inner_microstep: 5224.55 | bwd_allreduce_microstep: 35.83 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3803 [2024-07-31 22:47:36,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.62 | bwd_microstep: 5300.86 | bwd_inner_microstep: 5233.64 | bwd_allreduce_microstep: 67.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2334 [2024-07-31 22:47:45,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.27 | bwd_microstep: 5352.42 | bwd_inner_microstep: 4937.59 | bwd_allreduce_microstep: 414.75 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2231 [2024-07-31 22:47:54,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.73 | bwd_microstep: 5184.22 | bwd_inner_microstep: 4780.99 | bwd_allreduce_microstep: 403.16 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2090 [2024-07-31 22:48:02,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.27 | bwd_microstep: 5102.30 | bwd_inner_microstep: 4706.49 | bwd_allreduce_microstep: 395.74 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3683 [2024-07-31 22:48:11,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.43 | bwd_microstep: 5022.65 | bwd_inner_microstep: 4956.24 | bwd_allreduce_microstep: 66.35 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-31 22:48:19,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.20 | bwd_microstep: 5035.98 | bwd_inner_microstep: 4979.44 | bwd_allreduce_microstep: 56.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2129 [2024-07-31 22:48:28,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 22:48:28,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.04 | bwd_microstep: 5085.44 | bwd_inner_microstep: 4691.13 | bwd_allreduce_microstep: 394.25 | step_microstep: 182.07 [2024-07-31 22:48:28,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28800.40 | bwd: 41344.30 | bwd_inner: 39510.02 | bwd_allreduce: 1833.80 | step: 182.78 85%|████████▌ | 1048/1230 [20:36:34<3:32:17, 69.98s/it] {'loss': 1.1008, 'learning_rate': 1.1266850583232248e-06, 'epoch': 0.85} 85%|████████▌ | 1048/1230 [20:36:34<3:32:17, 69.98s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3889 [2024-07-31 22:48:37,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.93 | bwd_microstep: 5459.66 | bwd_inner_microstep: 5374.36 | bwd_allreduce_microstep: 85.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3842 [2024-07-31 22:48:46,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.06 | bwd_microstep: 5140.99 | bwd_inner_microstep: 5101.55 | bwd_allreduce_microstep: 39.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-31 22:48:55,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.26 | bwd_microstep: 5109.95 | bwd_inner_microstep: 5039.67 | bwd_allreduce_microstep: 70.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-31 22:49:03,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.30 | bwd_microstep: 4982.17 | bwd_inner_microstep: 4933.01 | bwd_allreduce_microstep: 49.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-31 22:49:12,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.76 | bwd_microstep: 4973.84 | bwd_inner_microstep: 4943.52 | bwd_allreduce_microstep: 30.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-31 22:49:21,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.49 | bwd_microstep: 5018.45 | bwd_inner_microstep: 4965.06 | bwd_allreduce_microstep: 53.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 22:49:29,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.84 | bwd_microstep: 5101.85 | bwd_inner_microstep: 5055.56 | bwd_allreduce_microstep: 46.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3668 [2024-07-31 22:49:38,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 22:49:38,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.10 | bwd_microstep: 4880.30 | bwd_inner_microstep: 4861.00 | bwd_allreduce_microstep: 19.23 | step_microstep: 181.75 [2024-07-31 22:49:38,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28897.64 | bwd: 40667.19 | bwd_inner: 40273.66 | bwd_allreduce: 393.05 | step: 182.33 85%|████████▌ | 1049/1230 [20:37:44<3:31:02, 69.96s/it] {'loss': 1.1489, 'learning_rate': 1.114572592690375e-06, 'epoch': 0.85} 85%|████████▌ | 1049/1230 [20:37:44<3:31:02, 69.96s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3964 [2024-07-31 22:49:47,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.88 | bwd_microstep: 5556.83 | bwd_inner_microstep: 5467.62 | bwd_allreduce_microstep: 89.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3936 [2024-07-31 22:49:56,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3658.27 | bwd_microstep: 5052.59 | bwd_inner_microstep: 5025.09 | bwd_allreduce_microstep: 27.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3786 [2024-07-31 22:50:05,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.50 | bwd_microstep: 5030.22 | bwd_inner_microstep: 5010.90 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3754 [2024-07-31 22:50:14,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.43 | bwd_microstep: 5122.20 | bwd_inner_microstep: 5056.43 | bwd_allreduce_microstep: 65.70 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-31 22:50:22,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.24 | bwd_microstep: 4991.67 | bwd_inner_microstep: 4972.15 | bwd_allreduce_microstep: 19.45 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 22:50:31,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.38 | bwd_microstep: 4922.55 | bwd_inner_microstep: 4883.78 | bwd_allreduce_microstep: 38.71 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-31 22:50:40,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.09 | bwd_microstep: 5100.42 | bwd_inner_microstep: 4705.28 | bwd_allreduce_microstep: 395.08 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 22:50:48,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 22:50:48,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.95 | bwd_microstep: 4723.06 | bwd_inner_microstep: 4697.89 | bwd_allreduce_microstep: 25.11 | step_microstep: 182.60 [2024-07-31 22:50:48,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28769.64 | bwd: 40499.53 | bwd_inner: 39819.06 | bwd_allreduce: 679.97 | step: 183.30 85%|████████▌ | 1050/1230 [20:38:54<3:29:33, 69.85s/it] {'loss': 1.1248, 'learning_rate': 1.1025217435440116e-06, 'epoch': 0.85} 85%|████████▌ | 1050/1230 [20:38:54<3:29:33, 69.85s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3982 [2024-07-31 22:50:56,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.25 | bwd_microstep: 5097.33 | bwd_inner_microstep: 5063.53 | bwd_allreduce_microstep: 33.73 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3573 [2024-07-31 22:51:05,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.98 | bwd_microstep: 5252.21 | bwd_inner_microstep: 5114.83 | bwd_allreduce_microstep: 137.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-07-31 22:51:14,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.09 | bwd_microstep: 5033.98 | bwd_inner_microstep: 5013.32 | bwd_allreduce_microstep: 20.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2299 [2024-07-31 22:51:23,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.04 | bwd_microstep: 5225.03 | bwd_inner_microstep: 4819.11 | bwd_allreduce_microstep: 405.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 22:51:32,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.14 | bwd_microstep: 5169.54 | bwd_inner_microstep: 5086.25 | bwd_allreduce_microstep: 83.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 22:51:41,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.38 | bwd_microstep: 5264.21 | bwd_inner_microstep: 4856.96 | bwd_allreduce_microstep: 407.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-31 22:51:49,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.40 | bwd_microstep: 5125.65 | bwd_inner_microstep: 5058.89 | bwd_allreduce_microstep: 66.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 22:51:58,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 22:51:58,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.44 | bwd_microstep: 5092.29 | bwd_inner_microstep: 4695.12 | bwd_allreduce_microstep: 397.10 | step_microstep: 182.13 [2024-07-31 22:51:58,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28878.62 | bwd: 41260.23 | bwd_inner: 39707.96 | bwd_allreduce: 1551.79 | step: 182.74 85%|████████▌ | 1051/1230 [20:40:04<3:28:56, 70.04s/it] {'loss': 1.125, 'learning_rate': 1.0905325944514034e-06, 'epoch': 0.85} 85%|████████▌ | 1051/1230 [20:40:04<3:28:56, 70.04s/it]dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3586 [2024-07-31 22:52:07,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.48 | bwd_microstep: 5369.07 | bwd_inner_microstep: 5272.59 | bwd_allreduce_microstep: 96.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3843 [2024-07-31 22:52:16,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3810.96 | bwd_microstep: 5161.82 | bwd_inner_microstep: 5134.85 | bwd_allreduce_microstep: 26.91 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2833 [2024-07-31 22:52:25,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.78 | bwd_microstep: 5168.64 | bwd_inner_microstep: 4763.60 | bwd_allreduce_microstep: 404.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-07-31 22:52:34,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.72 | bwd_microstep: 5236.50 | bwd_inner_microstep: 5154.70 | bwd_allreduce_microstep: 81.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-31 22:52:43,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.78 | bwd_microstep: 5148.07 | bwd_inner_microstep: 5067.32 | bwd_allreduce_microstep: 80.68 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3633 [2024-07-31 22:52:51,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.02 | bwd_microstep: 4963.76 | bwd_inner_microstep: 4900.66 | bwd_allreduce_microstep: 63.04 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-31 22:53:00,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.64 | bwd_microstep: 5067.93 | bwd_inner_microstep: 5004.92 | bwd_allreduce_microstep: 62.94 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 22:53:09,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 22:53:09,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.19 | bwd_microstep: 5110.59 | bwd_inner_microstep: 5045.10 | bwd_allreduce_microstep: 65.42 | step_microstep: 182.28 [2024-07-31 22:53:09,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28911.47 | bwd: 41226.38 | bwd_inner: 40343.69 | bwd_allreduce: 882.22 | step: 182.86 86%|████████▌ | 1052/1230 [20:41:15<3:28:09, 70.17s/it] {'loss': 1.1097, 'learning_rate': 1.078605228551971e-06, 'epoch': 0.86} 86%|████████▌ | 1052/1230 [20:41:15<3:28:09, 70.17s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3818 [2024-07-31 22:53:17,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.74 | bwd_microstep: 5034.07 | bwd_inner_microstep: 5014.98 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-07-31 22:53:26,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3487.54 | bwd_microstep: 5165.55 | bwd_inner_microstep: 4762.21 | bwd_allreduce_microstep: 403.27 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-31 22:53:35,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.28 | bwd_microstep: 5190.08 | bwd_inner_microstep: 5112.68 | bwd_allreduce_microstep: 77.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3794 [2024-07-31 22:53:44,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.55 | bwd_microstep: 4966.02 | bwd_inner_microstep: 4937.65 | bwd_allreduce_microstep: 28.30 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-31 22:53:52,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.69 | bwd_microstep: 5053.13 | bwd_inner_microstep: 5026.42 | bwd_allreduce_microstep: 26.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-31 22:54:01,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.44 | bwd_microstep: 5034.02 | bwd_inner_microstep: 4979.22 | bwd_allreduce_microstep: 54.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 22:54:10,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.13 | bwd_microstep: 5142.64 | bwd_inner_microstep: 5073.45 | bwd_allreduce_microstep: 69.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-31 22:54:18,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 22:54:18,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3459.81 | bwd_microstep: 5063.56 | bwd_inner_microstep: 4670.95 | bwd_allreduce_microstep: 392.55 | step_microstep: 181.75 [2024-07-31 22:54:18,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28813.09 | bwd: 40649.05 | bwd_inner: 39577.51 | bwd_allreduce: 1071.06 | step: 182.45 86%|████████▌ | 1053/1230 [20:42:24<3:26:40, 70.06s/it] {'loss': 1.1247, 'learning_rate': 1.0667397285566893e-06, 'epoch': 0.86} 86%|████████▌ | 1053/1230 [20:42:24<3:26:40, 70.06s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3748 [2024-07-31 22:54:27,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.36 | bwd_microstep: 5351.19 | bwd_inner_microstep: 5260.84 | bwd_allreduce_microstep: 90.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-31 22:54:36,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.04 | bwd_microstep: 5016.16 | bwd_inner_microstep: 4996.77 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-31 22:54:44,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3012.32 | bwd_microstep: 4878.50 | bwd_inner_microstep: 4502.63 | bwd_allreduce_microstep: 375.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-31 22:54:53,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.37 | bwd_microstep: 5230.75 | bwd_inner_microstep: 4824.84 | bwd_allreduce_microstep: 405.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3826 [2024-07-31 22:55:02,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.72 | bwd_microstep: 4928.39 | bwd_inner_microstep: 4903.52 | bwd_allreduce_microstep: 24.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-31 22:55:10,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.16 | bwd_microstep: 5190.49 | bwd_inner_microstep: 4787.27 | bwd_allreduce_microstep: 403.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 22:55:19,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.03 | bwd_microstep: 4885.67 | bwd_inner_microstep: 4866.27 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-31 22:55:28,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 22:55:28,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.44 | bwd_microstep: 5105.66 | bwd_inner_microstep: 4711.46 | bwd_allreduce_microstep: 394.13 | step_microstep: 182.05 [2024-07-31 22:55:28,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28360.33 | bwd: 40586.79 | bwd_inner: 38853.53 | bwd_allreduce: 1732.77 | step: 182.63 86%|████████▌ | 1054/1230 [20:43:34<3:24:48, 69.82s/it] {'loss': 1.1651, 'learning_rate': 1.0549361767475252e-06, 'epoch': 0.86} 86%|████████▌ | 1054/1230 [20:43:34<3:24:48, 69.82s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4065 [2024-07-31 22:55:37,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.17 | bwd_microstep: 5154.74 | bwd_inner_microstep: 5135.67 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3807 [2024-07-31 22:55:45,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.01 | bwd_microstep: 5179.92 | bwd_inner_microstep: 5131.47 | bwd_allreduce_microstep: 48.38 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2267 [2024-07-31 22:55:54,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.41 | bwd_microstep: 5254.10 | bwd_inner_microstep: 4845.59 | bwd_allreduce_microstep: 408.43 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3891 [2024-07-31 22:56:03,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.10 | bwd_microstep: 5277.17 | bwd_inner_microstep: 5222.09 | bwd_allreduce_microstep: 55.00 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-31 22:56:12,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.04 | bwd_microstep: 5027.13 | bwd_inner_microstep: 4975.67 | bwd_allreduce_microstep: 51.40 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-31 22:56:21,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.21 | bwd_microstep: 5036.38 | bwd_inner_microstep: 5012.91 | bwd_allreduce_microstep: 23.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 22:56:29,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.70 | bwd_microstep: 5065.87 | bwd_inner_microstep: 5004.11 | bwd_allreduce_microstep: 61.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-31 22:56:38,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 22:56:38,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.65 | bwd_microstep: 5183.51 | bwd_inner_microstep: 4782.62 | bwd_allreduce_microstep: 400.82 | step_microstep: 181.79 [2024-07-31 22:56:38,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28998.18 | bwd: 41178.81 | bwd_inner: 40110.08 | bwd_allreduce: 1068.23 | step: 182.49 86%|████████▌ | 1055/1230 [20:44:44<3:24:15, 70.03s/it] {'loss': 1.1221, 'learning_rate': 1.0431946549768567e-06, 'epoch': 0.86} 86%|████████▌ | 1055/1230 [20:44:44<3:24:15, 70.03s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4020 [2024-07-31 22:56:47,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.07 | bwd_microstep: 5336.38 | bwd_inner_microstep: 5289.01 | bwd_allreduce_microstep: 47.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3881 [2024-07-31 22:56:55,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3167.16 | bwd_microstep: 4953.82 | bwd_inner_microstep: 4922.29 | bwd_allreduce_microstep: 31.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2250 [2024-07-31 22:57:04,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3450.95 | bwd_microstep: 5017.90 | bwd_inner_microstep: 4628.59 | bwd_allreduce_microstep: 389.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3757 [2024-07-31 22:57:13,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.34 | bwd_microstep: 5003.55 | bwd_inner_microstep: 4984.26 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2193 [2024-07-31 22:57:21,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.54 | bwd_microstep: 5184.02 | bwd_inner_microstep: 4781.88 | bwd_allreduce_microstep: 402.07 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3671 [2024-07-31 22:57:30,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.06 | bwd_microstep: 5057.56 | bwd_inner_microstep: 4991.39 | bwd_allreduce_microstep: 66.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 22:57:38,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3426.49 | bwd_microstep: 4965.96 | bwd_inner_microstep: 4923.64 | bwd_allreduce_microstep: 42.25 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2112 [2024-07-31 22:57:47,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 22:57:47,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.57 | bwd_microstep: 5225.88 | bwd_inner_microstep: 4819.40 | bwd_allreduce_microstep: 406.41 | step_microstep: 181.51 [2024-07-31 22:57:47,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28168.11 | bwd: 40745.04 | bwd_inner: 39340.40 | bwd_allreduce: 1404.16 | step: 182.09 86%|████████▌ | 1056/1230 [20:45:53<3:22:24, 69.79s/it] {'loss': 1.1349, 'learning_rate': 1.0315152446669142e-06, 'epoch': 0.86} 86%|████████▌ | 1056/1230 [20:45:53<3:22:24, 69.79s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3933 [2024-07-31 22:57:56,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.34 | bwd_microstep: 5166.52 | bwd_inner_microstep: 5137.31 | bwd_allreduce_microstep: 29.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3939 [2024-07-31 22:58:05,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.54 | bwd_microstep: 5146.84 | bwd_inner_microstep: 5112.59 | bwd_allreduce_microstep: 34.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-31 22:58:14,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.36 | bwd_microstep: 5214.56 | bwd_inner_microstep: 5132.04 | bwd_allreduce_microstep: 82.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-31 22:58:23,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.01 | bwd_microstep: 5190.22 | bwd_inner_microstep: 5107.30 | bwd_allreduce_microstep: 82.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2198 [2024-07-31 22:58:32,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.49 | bwd_microstep: 5217.85 | bwd_inner_microstep: 4813.26 | bwd_allreduce_microstep: 404.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-31 22:58:40,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.18 | bwd_microstep: 5021.61 | bwd_inner_microstep: 4958.54 | bwd_allreduce_microstep: 63.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 22:58:49,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.02 | bwd_microstep: 5038.11 | bwd_inner_microstep: 4982.01 | bwd_allreduce_microstep: 56.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-31 22:58:58,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 22:58:58,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.09 | bwd_microstep: 5020.73 | bwd_inner_microstep: 4968.16 | bwd_allreduce_microstep: 52.50 | step_microstep: 181.51 [2024-07-31 22:58:58,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28818.93 | bwd: 41016.42 | bwd_inner: 40211.13 | bwd_allreduce: 804.82 | step: 182.08 86%|████████▌ | 1057/1230 [20:47:04<3:21:33, 69.91s/it] {'loss': 1.1144, 'learning_rate': 1.019898026809214e-06, 'epoch': 0.86} 86%|████████▌ | 1057/1230 [20:47:04<3:21:33, 69.91s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3283 [2024-07-31 22:59:07,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.19 | bwd_microstep: 5348.71 | bwd_inner_microstep: 5096.95 | bwd_allreduce_microstep: 251.69 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2066 [2024-07-31 22:59:16,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.39 | bwd_microstep: 5298.41 | bwd_inner_microstep: 4886.38 | bwd_allreduce_microstep: 411.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 22:59:24,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.84 | bwd_microstep: 5191.85 | bwd_inner_microstep: 5111.78 | bwd_allreduce_microstep: 80.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-31 22:59:33,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.78 | bwd_microstep: 5114.10 | bwd_inner_microstep: 5068.53 | bwd_allreduce_microstep: 45.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-31 22:59:41,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3033.20 | bwd_microstep: 4989.12 | bwd_inner_microstep: 4603.92 | bwd_allreduce_microstep: 385.13 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3729 [2024-07-31 22:59:49,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3107.28 | bwd_microstep: 4915.15 | bwd_inner_microstep: 4874.57 | bwd_allreduce_microstep: 40.52 | step_microstep: 0.19 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2107 [2024-07-31 22:59:58,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.00 | bwd_microstep: 5099.17 | bwd_inner_microstep: 4703.52 | bwd_allreduce_microstep: 395.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-31 23:00:06,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 23:00:06,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3198.59 | bwd_microstep: 4706.18 | bwd_inner_microstep: 4681.62 | bwd_allreduce_microstep: 24.48 | step_microstep: 181.85 [2024-07-31 23:00:06,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27272.17 | bwd: 40662.68 | bwd_inner: 39027.21 | bwd_allreduce: 1634.98 | step: 182.55 86%|████████▌ | 1058/1230 [20:48:12<3:18:59, 69.41s/it] {'loss': 1.1489, 'learning_rate': 1.0083430819639962e-06, 'epoch': 0.86} 86%|████████▌ | 1058/1230 [20:48:12<3:18:59, 69.41s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 4070 [2024-07-31 23:00:15,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.72 | bwd_microstep: 5384.62 | bwd_inner_microstep: 5334.21 | bwd_allreduce_microstep: 50.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3815 [2024-07-31 23:00:24,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.22 | bwd_microstep: 5414.30 | bwd_inner_microstep: 5337.25 | bwd_allreduce_microstep: 76.98 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3808 [2024-07-31 23:00:33,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.55 | bwd_microstep: 5107.95 | bwd_inner_microstep: 5080.62 | bwd_allreduce_microstep: 27.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-07-31 23:00:41,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3053.64 | bwd_microstep: 4998.47 | bwd_inner_microstep: 4612.87 | bwd_allreduce_microstep: 385.54 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2090 [2024-07-31 23:00:50,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.42 | bwd_microstep: 5063.93 | bwd_inner_microstep: 4668.80 | bwd_allreduce_microstep: 395.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 23:00:58,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.26 | bwd_microstep: 4971.26 | bwd_inner_microstep: 4951.86 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 23:01:07,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.43 | bwd_microstep: 5042.98 | bwd_inner_microstep: 4988.05 | bwd_allreduce_microstep: 54.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 23:01:16,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 23:01:16,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.69 | bwd_microstep: 5029.63 | bwd_inner_microstep: 4971.89 | bwd_allreduce_microstep: 57.67 | step_microstep: 181.61 [2024-07-31 23:01:16,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28508.83 | bwd: 41013.13 | bwd_inner: 39945.48 | bwd_allreduce: 1067.16 | step: 182.22 86%|████████▌ | 1059/1230 [20:49:22<3:18:12, 69.55s/it] {'loss': 1.1133, 'learning_rate': 9.968504902596576e-07, 'epoch': 0.86} 86%|████████▌ | 1059/1230 [20:49:22<3:18:12, 69.55s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3977 [2024-07-31 23:01:25,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.16 | bwd_microstep: 5355.06 | bwd_inner_microstep: 5314.53 | bwd_allreduce_microstep: 40.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3831 [2024-07-31 23:01:33,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3233.17 | bwd_microstep: 4913.33 | bwd_inner_microstep: 4888.05 | bwd_allreduce_microstep: 25.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3880 [2024-07-31 23:01:42,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.24 | bwd_microstep: 5160.97 | bwd_inner_microstep: 5119.58 | bwd_allreduce_microstep: 41.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-31 23:01:51,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.46 | bwd_microstep: 5042.03 | bwd_inner_microstep: 5017.61 | bwd_allreduce_microstep: 24.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-31 23:01:59,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.36 | bwd_microstep: 5214.78 | bwd_inner_microstep: 4809.21 | bwd_allreduce_microstep: 405.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-31 23:02:08,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.32 | bwd_microstep: 4920.70 | bwd_inner_microstep: 4897.17 | bwd_allreduce_microstep: 23.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-31 23:02:17,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.98 | bwd_microstep: 5069.52 | bwd_inner_microstep: 4675.21 | bwd_allreduce_microstep: 394.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 23:02:25,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 23:02:25,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.69 | bwd_microstep: 5076.08 | bwd_inner_microstep: 5013.34 | bwd_allreduce_microstep: 62.67 | step_microstep: 181.45 [2024-07-31 23:02:25,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28628.27 | bwd: 40752.45 | bwd_inner: 39734.65 | bwd_allreduce: 1017.33 | step: 182.02 86%|████████▌ | 1060/1230 [20:50:31<3:17:11, 69.60s/it] {'loss': 1.1033, 'learning_rate': 9.85420331392214e-07, 'epoch': 0.86} 86%|████████▌ | 1060/1230 [20:50:31<3:17:11, 69.60s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3974 [2024-07-31 23:02:35,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3890.33 | bwd_microstep: 5339.67 | bwd_inner_microstep: 5311.11 | bwd_allreduce_microstep: 28.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4040 [2024-07-31 23:02:44,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.10 | bwd_microstep: 5207.39 | bwd_inner_microstep: 5184.98 | bwd_allreduce_microstep: 22.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4019 [2024-07-31 23:02:53,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3835.32 | bwd_microstep: 5279.28 | bwd_inner_microstep: 5259.89 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2220 [2024-07-31 23:03:02,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.28 | bwd_microstep: 5243.74 | bwd_inner_microstep: 4838.74 | bwd_allreduce_microstep: 404.93 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3640 [2024-07-31 23:03:10,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.11 | bwd_microstep: 5161.71 | bwd_inner_microstep: 5063.08 | bwd_allreduce_microstep: 98.56 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3649 [2024-07-31 23:03:19,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.14 | bwd_microstep: 5128.07 | bwd_inner_microstep: 5046.32 | bwd_allreduce_microstep: 81.68 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2159 [2024-07-31 23:03:28,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.61 | bwd_microstep: 5063.12 | bwd_inner_microstep: 4671.39 | bwd_allreduce_microstep: 391.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-31 23:03:37,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 23:03:37,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.96 | bwd_microstep: 5109.90 | bwd_inner_microstep: 5041.88 | bwd_allreduce_microstep: 67.94 | step_microstep: 181.74 [2024-07-31 23:03:37,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29237.76 | bwd: 41532.87 | bwd_inner: 40417.32 | bwd_allreduce: 1115.06 | step: 182.34 86%|████████▋ | 1061/1230 [20:51:42<3:17:18, 70.05s/it] {'loss': 1.109, 'learning_rate': 9.74052684624731e-07, 'epoch': 0.86} 86%|████████▋ | 1061/1230 [20:51:42<3:17:18, 70.05s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3829 [2024-07-31 23:03:45,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.04 | bwd_microstep: 5121.27 | bwd_inner_microstep: 5077.38 | bwd_allreduce_microstep: 43.82 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2036 [2024-07-31 23:03:54,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3478.62 | bwd_microstep: 5276.58 | bwd_inner_microstep: 4870.70 | bwd_allreduce_microstep: 405.81 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3814 [2024-07-31 23:04:03,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3824.68 | bwd_microstep: 5334.34 | bwd_inner_microstep: 5281.63 | bwd_allreduce_microstep: 52.64 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3821 [2024-07-31 23:04:12,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.66 | bwd_microstep: 5117.10 | bwd_inner_microstep: 5058.91 | bwd_allreduce_microstep: 58.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-31 23:04:21,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.85 | bwd_microstep: 4998.89 | bwd_inner_microstep: 4964.50 | bwd_allreduce_microstep: 34.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 23:04:30,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.89 | bwd_microstep: 5177.59 | bwd_inner_microstep: 5090.99 | bwd_allreduce_microstep: 86.53 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3694 [2024-07-31 23:04:38,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.86 | bwd_microstep: 4995.23 | bwd_inner_microstep: 4929.79 | bwd_allreduce_microstep: 65.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 23:04:47,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 23:04:47,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.69 | bwd_microstep: 5121.01 | bwd_inner_microstep: 4722.91 | bwd_allreduce_microstep: 398.02 | step_microstep: 181.31 [2024-07-31 23:04:47,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28922.21 | bwd: 41141.98 | bwd_inner: 39996.75 | bwd_allreduce: 1144.75 | step: 182.00 86%|████████▋ | 1062/1230 [20:52:53<3:16:25, 70.15s/it] {'loss': 1.1, 'learning_rate': 9.62747628786782e-07, 'epoch': 0.86} 86%|████████▋ | 1062/1230 [20:52:53<3:16:25, 70.15s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3963 [2024-07-31 23:04:56,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3807.40 | bwd_microstep: 5201.01 | bwd_inner_microstep: 5181.84 | bwd_allreduce_microstep: 19.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-31 23:05:05,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.13 | bwd_microstep: 5203.35 | bwd_inner_microstep: 5115.08 | bwd_allreduce_microstep: 88.21 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2285 [2024-07-31 23:05:14,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.99 | bwd_microstep: 5203.42 | bwd_inner_microstep: 4801.03 | bwd_allreduce_microstep: 402.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3965 [2024-07-31 23:05:22,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.02 | bwd_microstep: 5043.70 | bwd_inner_microstep: 5020.47 | bwd_allreduce_microstep: 23.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-31 23:05:31,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.18 | bwd_microstep: 5112.05 | bwd_inner_microstep: 5066.58 | bwd_allreduce_microstep: 45.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-31 23:05:40,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.42 | bwd_microstep: 5171.34 | bwd_inner_microstep: 4769.81 | bwd_allreduce_microstep: 401.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-07-31 23:05:49,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.97 | bwd_microstep: 5073.81 | bwd_inner_microstep: 5027.79 | bwd_allreduce_microstep: 45.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-31 23:05:57,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-31 23:05:57,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.88 | bwd_microstep: 5018.31 | bwd_inner_microstep: 4966.56 | bwd_allreduce_microstep: 51.68 | step_microstep: 182.09 [2024-07-31 23:05:57,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28996.90 | bwd: 41026.97 | bwd_inner: 39949.09 | bwd_allreduce: 1077.38 | step: 182.65 86%|████████▋ | 1063/1230 [20:54:03<3:15:25, 70.21s/it] {'loss': 1.2168, 'learning_rate': 9.515052422739013e-07, 'epoch': 0.86} 86%|████████▋ | 1063/1230 [20:54:03<3:15:25, 70.21s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3819 [2024-07-31 23:06:06,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3817.00 | bwd_microstep: 5197.81 | bwd_inner_microstep: 5164.15 | bwd_allreduce_microstep: 33.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-31 23:06:15,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.72 | bwd_microstep: 5252.16 | bwd_inner_microstep: 5163.99 | bwd_allreduce_microstep: 88.10 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3817 [2024-07-31 23:06:24,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.63 | bwd_microstep: 5277.45 | bwd_inner_microstep: 5197.82 | bwd_allreduce_microstep: 79.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3765 [2024-07-31 23:06:33,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.46 | bwd_microstep: 5116.23 | bwd_inner_microstep: 5085.77 | bwd_allreduce_microstep: 30.39 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1606 [2024-07-31 23:06:41,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3066.93 | bwd_microstep: 5168.25 | bwd_inner_microstep: 4774.58 | bwd_allreduce_microstep: 393.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-31 23:06:50,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.85 | bwd_microstep: 4915.77 | bwd_inner_microstep: 4890.48 | bwd_allreduce_microstep: 25.22 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3704 [2024-07-31 23:06:58,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.97 | bwd_microstep: 4952.17 | bwd_inner_microstep: 4893.93 | bwd_allreduce_microstep: 58.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-31 23:07:07,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-31 23:07:07,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.06 | bwd_microstep: 5182.30 | bwd_inner_microstep: 4779.72 | bwd_allreduce_microstep: 402.51 | step_microstep: 181.04 [2024-07-31 23:07:07,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28656.54 | bwd: 41062.11 | bwd_inner: 39950.38 | bwd_allreduce: 1111.27 | step: 181.62 87%|████████▋ | 1064/1230 [20:55:13<3:14:07, 70.16s/it] {'loss': 1.1188, 'learning_rate': 9.403256030470386e-07, 'epoch': 0.86} 87%|████████▋ | 1064/1230 [20:55:13<3:14:07, 70.16s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4079 [2024-07-31 23:07:17,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3862.10 | bwd_microstep: 5328.10 | bwd_inner_microstep: 5308.98 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3843 [2024-07-31 23:07:25,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3791.39 | bwd_microstep: 5116.79 | bwd_inner_microstep: 5097.42 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3773 [2024-07-31 23:07:35,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.15 | bwd_microstep: 5324.02 | bwd_inner_microstep: 5253.87 | bwd_allreduce_microstep: 70.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3969 [2024-07-31 23:07:43,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.03 | bwd_microstep: 5127.26 | bwd_inner_microstep: 5097.84 | bwd_allreduce_microstep: 29.35 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3322 [2024-07-31 23:07:52,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.04 | bwd_microstep: 5234.71 | bwd_inner_microstep: 5059.39 | bwd_allreduce_microstep: 175.26 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3669 [2024-07-31 23:08:01,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.38 | bwd_microstep: 5006.39 | bwd_inner_microstep: 4968.13 | bwd_allreduce_microstep: 38.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 23:08:10,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.14 | bwd_microstep: 5165.19 | bwd_inner_microstep: 5111.61 | bwd_allreduce_microstep: 53.51 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-31 23:08:18,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 23:08:18,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3191.77 | bwd_microstep: 4736.54 | bwd_inner_microstep: 4709.81 | bwd_allreduce_microstep: 26.66 | step_microstep: 181.26 [2024-07-31 23:08:18,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29151.88 | bwd: 41038.98 | bwd_inner: 40607.00 | bwd_allreduce: 431.48 | step: 181.97 87%|████████▋ | 1065/1230 [20:56:24<3:13:15, 70.28s/it] {'loss': 1.1206, 'learning_rate': 9.292087886320166e-07, 'epoch': 0.87} 87%|████████▋ | 1065/1230 [20:56:24<3:13:15, 70.28s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3857 [2024-07-31 23:08:27,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.93 | bwd_microstep: 5504.58 | bwd_inner_microstep: 5415.34 | bwd_allreduce_microstep: 89.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3735 [2024-07-31 23:08:36,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.06 | bwd_microstep: 5116.88 | bwd_inner_microstep: 5044.31 | bwd_allreduce_microstep: 72.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3789 [2024-07-31 23:08:45,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.60 | bwd_microstep: 5051.15 | bwd_inner_microstep: 5027.62 | bwd_allreduce_microstep: 23.45 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-31 23:08:53,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.34 | bwd_microstep: 5208.28 | bwd_inner_microstep: 4802.83 | bwd_allreduce_microstep: 405.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 23:09:02,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.68 | bwd_microstep: 5167.19 | bwd_inner_microstep: 5111.59 | bwd_allreduce_microstep: 55.53 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-31 23:09:11,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.19 | bwd_microstep: 5012.98 | bwd_inner_microstep: 4989.13 | bwd_allreduce_microstep: 23.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-31 23:09:20,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.01 | bwd_microstep: 5114.96 | bwd_inner_microstep: 5068.92 | bwd_allreduce_microstep: 45.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-31 23:09:29,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 23:09:29,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.98 | bwd_microstep: 5100.38 | bwd_inner_microstep: 4704.47 | bwd_allreduce_microstep: 395.84 | step_microstep: 181.88 [2024-07-31 23:09:29,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29099.71 | bwd: 41276.38 | bwd_inner: 40164.15 | bwd_allreduce: 1111.75 | step: 182.47 87%|████████▋ | 1066/1230 [20:57:34<3:12:26, 70.41s/it] {'loss': 1.132, 'learning_rate': 9.181548761189996e-07, 'epoch': 0.87} 87%|████████▋ | 1066/1230 [20:57:34<3:12:26, 70.41s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2340 [2024-07-31 23:09:37,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3136.02 | bwd_microstep: 5225.72 | bwd_inner_microstep: 4825.78 | bwd_allreduce_microstep: 399.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3584 [2024-07-31 23:09:46,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.06 | bwd_microstep: 5178.05 | bwd_inner_microstep: 5089.13 | bwd_allreduce_microstep: 88.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3803 [2024-07-31 23:09:55,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.04 | bwd_microstep: 5020.01 | bwd_inner_microstep: 5000.66 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-31 23:10:03,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.23 | bwd_microstep: 5148.64 | bwd_inner_microstep: 4748.39 | bwd_allreduce_microstep: 400.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-31 23:10:12,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.61 | bwd_microstep: 5114.29 | bwd_inner_microstep: 5044.79 | bwd_allreduce_microstep: 69.43 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2186 [2024-07-31 23:10:21,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.30 | bwd_microstep: 5052.54 | bwd_inner_microstep: 4658.83 | bwd_allreduce_microstep: 393.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-31 23:10:29,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.52 | bwd_microstep: 5043.43 | bwd_inner_microstep: 4986.11 | bwd_allreduce_microstep: 57.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3716 [2024-07-31 23:10:38,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 23:10:38,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.78 | bwd_microstep: 5020.78 | bwd_inner_microstep: 4963.92 | bwd_allreduce_microstep: 56.79 | step_microstep: 181.32 [2024-07-31 23:10:38,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28208.47 | bwd: 40803.44 | bwd_inner: 39317.56 | bwd_allreduce: 1485.38 | step: 181.90 87%|████████▋ | 1067/1230 [20:58:44<3:10:23, 70.09s/it] {'loss': 1.1774, 'learning_rate': 9.071639421619527e-07, 'epoch': 0.87} 87%|████████▋ | 1067/1230 [20:58:44<3:10:23, 70.09s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3986 [2024-07-31 23:10:47,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3896.66 | bwd_microstep: 5343.73 | bwd_inner_microstep: 5318.42 | bwd_allreduce_microstep: 25.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3782 [2024-07-31 23:10:56,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3791.14 | bwd_microstep: 5137.47 | bwd_inner_microstep: 5102.81 | bwd_allreduce_microstep: 34.60 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3822 [2024-07-31 23:11:05,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.76 | bwd_microstep: 5050.42 | bwd_inner_microstep: 5024.29 | bwd_allreduce_microstep: 26.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-07-31 23:11:13,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3438.30 | bwd_microstep: 5003.43 | bwd_inner_microstep: 4615.94 | bwd_allreduce_microstep: 387.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 23:11:22,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.09 | bwd_microstep: 5053.95 | bwd_inner_microstep: 4994.55 | bwd_allreduce_microstep: 59.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-31 23:11:31,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.33 | bwd_microstep: 5057.76 | bwd_inner_microstep: 4997.54 | bwd_allreduce_microstep: 60.14 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3680 [2024-07-31 23:11:39,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.04 | bwd_microstep: 5055.26 | bwd_inner_microstep: 5002.35 | bwd_allreduce_microstep: 52.85 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3661 [2024-07-31 23:11:48,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 23:11:48,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.46 | bwd_microstep: 5132.19 | bwd_inner_microstep: 5040.85 | bwd_allreduce_microstep: 91.26 | step_microstep: 181.43 [2024-07-31 23:11:48,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29178.70 | bwd: 40834.19 | bwd_inner: 40096.68 | bwd_allreduce: 737.02 | step: 182.02 87%|████████▋ | 1068/1230 [20:59:54<3:09:26, 70.16s/it] {'loss': 1.1345, 'learning_rate': 8.962360629781153e-07, 'epoch': 0.87} 87%|████████▋ | 1068/1230 [20:59:54<3:09:26, 70.16s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3984 [2024-07-31 23:11:58,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.64 | bwd_microstep: 5534.90 | bwd_inner_microstep: 5467.23 | bwd_allreduce_microstep: 67.60 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2790 [2024-07-31 23:12:07,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.49 | bwd_microstep: 5343.32 | bwd_inner_microstep: 4927.14 | bwd_allreduce_microstep: 416.11 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3581 [2024-07-31 23:12:15,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.29 | bwd_microstep: 5177.09 | bwd_inner_microstep: 5091.33 | bwd_allreduce_microstep: 85.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-31 23:12:24,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.25 | bwd_microstep: 5019.54 | bwd_inner_microstep: 5000.05 | bwd_allreduce_microstep: 19.42 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2884 [2024-07-31 23:12:33,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.64 | bwd_microstep: 5282.33 | bwd_inner_microstep: 4871.42 | bwd_allreduce_microstep: 410.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-31 23:12:42,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.12 | bwd_microstep: 4982.98 | bwd_inner_microstep: 4963.74 | bwd_allreduce_microstep: 19.17 | step_microstep: 0.10 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1132 [2024-07-31 23:12:50,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3445.84 | bwd_microstep: 5103.25 | bwd_inner_microstep: 4709.97 | bwd_allreduce_microstep: 393.21 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2177 [2024-07-31 23:12:59,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 23:12:59,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.30 | bwd_microstep: 5123.94 | bwd_inner_microstep: 4726.97 | bwd_allreduce_microstep: 396.90 | step_microstep: 183.47 [2024-07-31 23:12:59,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28994.46 | bwd: 41567.33 | bwd_inner: 39757.79 | bwd_allreduce: 1809.06 | step: 184.07 87%|████████▋ | 1069/1230 [21:01:05<3:08:51, 70.38s/it] {'loss': 1.1571, 'learning_rate': 8.853713143474685e-07, 'epoch': 0.87} 87%|████████▋ | 1069/1230 [21:01:05<3:08:51, 70.38s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3931 [2024-07-31 23:13:08,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3364.02 | bwd_microstep: 5279.92 | bwd_inner_microstep: 5219.43 | bwd_allreduce_microstep: 60.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3923 [2024-07-31 23:13:17,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.35 | bwd_microstep: 5039.02 | bwd_inner_microstep: 5010.47 | bwd_allreduce_microstep: 28.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3598 [2024-07-31 23:13:25,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3165.16 | bwd_microstep: 5173.35 | bwd_inner_microstep: 5083.43 | bwd_allreduce_microstep: 89.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3750 [2024-07-31 23:13:34,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.94 | bwd_microstep: 5026.32 | bwd_inner_microstep: 4988.87 | bwd_allreduce_microstep: 37.38 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-31 23:13:42,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.46 | bwd_microstep: 5163.97 | bwd_inner_microstep: 5107.05 | bwd_allreduce_microstep: 56.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-31 23:13:51,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.30 | bwd_microstep: 5009.70 | bwd_inner_microstep: 4958.45 | bwd_allreduce_microstep: 51.18 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3679 [2024-07-31 23:14:00,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.83 | bwd_microstep: 5126.64 | bwd_inner_microstep: 5037.53 | bwd_allreduce_microstep: 89.04 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2138 [2024-07-31 23:14:08,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 23:14:08,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3437.08 | bwd_microstep: 4998.57 | bwd_inner_microstep: 4611.33 | bwd_allreduce_microstep: 387.17 | step_microstep: 181.55 [2024-07-31 23:14:08,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27914.04 | bwd: 40817.45 | bwd_inner: 40016.50 | bwd_allreduce: 800.47 | step: 182.13 87%|████████▋ | 1070/1230 [21:02:14<3:06:37, 69.99s/it] {'loss': 1.1471, 'learning_rate': 8.745697716122081e-07, 'epoch': 0.87} 87%|████████▋ | 1070/1230 [21:02:14<3:06:37, 69.99s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4020 [2024-07-31 23:14:17,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3828.17 | bwd_microstep: 5265.72 | bwd_inner_microstep: 5246.67 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2054 [2024-07-31 23:14:26,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.04 | bwd_microstep: 5240.71 | bwd_inner_microstep: 4834.62 | bwd_allreduce_microstep: 406.02 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3252 [2024-07-31 23:14:35,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.39 | bwd_microstep: 5186.73 | bwd_inner_microstep: 4989.08 | bwd_allreduce_microstep: 197.57 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-31 23:14:44,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.07 | bwd_microstep: 5139.79 | bwd_inner_microstep: 5062.58 | bwd_allreduce_microstep: 77.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-31 23:14:53,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.10 | bwd_microstep: 5146.34 | bwd_inner_microstep: 5090.04 | bwd_allreduce_microstep: 56.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-31 23:15:00,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3218.07 | bwd_microstep: 4757.72 | bwd_inner_microstep: 4732.92 | bwd_allreduce_microstep: 24.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-31 23:15:09,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.94 | bwd_microstep: 5058.62 | bwd_inner_microstep: 4997.18 | bwd_allreduce_microstep: 61.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2139 [2024-07-31 23:15:18,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-31 23:15:18,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3454.87 | bwd_microstep: 5048.23 | bwd_inner_microstep: 4657.83 | bwd_allreduce_microstep: 390.33 | step_microstep: 183.08 [2024-07-31 23:15:18,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28453.56 | bwd: 40843.82 | bwd_inner: 39610.87 | bwd_allreduce: 1232.46 | step: 183.65 87%|████████▋ | 1071/1230 [21:03:24<3:05:10, 69.88s/it] {'loss': 1.0699, 'learning_rate': 8.638315096762306e-07, 'epoch': 0.87} 87%|████████▋ | 1071/1230 [21:03:24<3:05:10, 69.88s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3822 [2024-07-31 23:15:27,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.36 | bwd_microstep: 5059.30 | bwd_inner_microstep: 5040.11 | bwd_allreduce_microstep: 19.12 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2314 [2024-07-31 23:15:36,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.97 | bwd_microstep: 5256.17 | bwd_inner_microstep: 4849.28 | bwd_allreduce_microstep: 406.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-07-31 23:15:44,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3406.45 | bwd_microstep: 5053.35 | bwd_inner_microstep: 5015.36 | bwd_allreduce_microstep: 37.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 23:15:53,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.48 | bwd_microstep: 4988.02 | bwd_inner_microstep: 4968.55 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-31 23:16:01,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.42 | bwd_microstep: 4882.42 | bwd_inner_microstep: 4863.08 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 23:16:10,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.69 | bwd_microstep: 4998.96 | bwd_inner_microstep: 4979.62 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-31 23:16:19,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.58 | bwd_microstep: 5222.47 | bwd_inner_microstep: 4817.23 | bwd_allreduce_microstep: 405.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-31 23:16:28,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 23:16:28,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.87 | bwd_microstep: 5177.58 | bwd_inner_microstep: 5100.36 | bwd_allreduce_microstep: 77.15 | step_microstep: 181.55 [2024-07-31 23:16:28,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29049.73 | bwd: 40638.25 | bwd_inner: 39633.54 | bwd_allreduce: 1004.22 | step: 182.12 87%|████████▋ | 1072/1230 [21:04:34<3:04:07, 69.92s/it] {'loss': 1.1462, 'learning_rate': 8.531566030046035e-07, 'epoch': 0.87} 87%|████████▋ | 1072/1230 [21:04:34<3:04:07, 69.92s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3865 [2024-07-31 23:16:37,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.83 | bwd_microstep: 5207.24 | bwd_inner_microstep: 5155.93 | bwd_allreduce_microstep: 51.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-31 23:16:46,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.24 | bwd_microstep: 5294.83 | bwd_inner_microstep: 5229.66 | bwd_allreduce_microstep: 65.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-31 23:16:54,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.85 | bwd_microstep: 5044.49 | bwd_inner_microstep: 5025.14 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3814 [2024-07-31 23:17:03,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.43 | bwd_microstep: 5047.28 | bwd_inner_microstep: 5027.95 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 23:17:12,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.30 | bwd_microstep: 5005.02 | bwd_inner_microstep: 4954.72 | bwd_allreduce_microstep: 50.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-31 23:17:21,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.80 | bwd_microstep: 4988.56 | bwd_inner_microstep: 4969.20 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-07-31 23:17:29,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.94 | bwd_microstep: 4916.44 | bwd_inner_microstep: 4892.92 | bwd_allreduce_microstep: 23.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 23:17:38,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.82 [2024-07-31 23:17:38,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.29 | bwd_microstep: 4904.65 | bwd_inner_microstep: 4885.20 | bwd_allreduce_microstep: 19.37 | step_microstep: 182.71 [2024-07-31 23:17:38,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29436.58 | bwd: 40408.50 | bwd_inner: 40140.67 | bwd_allreduce: 267.33 | step: 183.29 87%|████████▋ | 1073/1230 [21:05:44<3:03:09, 70.00s/it] {'loss': 1.1512, 'learning_rate': 8.42545125623061e-07, 'epoch': 0.87} 87%|████████▋ | 1073/1230 [21:05:44<3:03:09, 70.00s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2405 [2024-07-31 23:17:46,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3084.76 | bwd_microstep: 5142.69 | bwd_inner_microstep: 4751.37 | bwd_allreduce_microstep: 391.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3734 [2024-07-31 23:17:55,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3362.27 | bwd_microstep: 4965.37 | bwd_inner_microstep: 4935.18 | bwd_allreduce_microstep: 30.12 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 23:18:03,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.60 | bwd_microstep: 5082.69 | bwd_inner_microstep: 5041.54 | bwd_allreduce_microstep: 41.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3882 [2024-07-31 23:18:12,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.42 | bwd_microstep: 5006.54 | bwd_inner_microstep: 4981.55 | bwd_allreduce_microstep: 24.93 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2227 [2024-07-31 23:18:21,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.52 | bwd_microstep: 5370.82 | bwd_inner_microstep: 4955.12 | bwd_allreduce_microstep: 415.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-31 23:18:30,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.29 | bwd_microstep: 4975.59 | bwd_inner_microstep: 4956.27 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-31 23:18:38,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3031.78 | bwd_microstep: 4878.15 | bwd_inner_microstep: 4501.24 | bwd_allreduce_microstep: 376.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-31 23:18:46,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 23:18:46,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.80 | bwd_microstep: 4881.13 | bwd_inner_microstep: 4861.76 | bwd_allreduce_microstep: 19.29 | step_microstep: 181.35 [2024-07-31 23:18:46,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27687.34 | bwd: 40302.95 | bwd_inner: 38983.98 | bwd_allreduce: 1318.47 | step: 182.03 87%|████████▋ | 1074/1230 [21:06:52<3:00:41, 69.50s/it] {'loss': 1.1104, 'learning_rate': 8.319971511174696e-07, 'epoch': 0.87} 87%|████████▋ | 1074/1230 [21:06:52<3:00:41, 69.50s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3921 [2024-07-31 23:18:55,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.83 | bwd_microstep: 5036.54 | bwd_inner_microstep: 5009.34 | bwd_allreduce_microstep: 27.13 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-31 23:19:03,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3390.64 | bwd_microstep: 4904.90 | bwd_inner_microstep: 4867.46 | bwd_allreduce_microstep: 37.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2232 [2024-07-31 23:19:12,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.90 | bwd_microstep: 5109.00 | bwd_inner_microstep: 4711.86 | bwd_allreduce_microstep: 397.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-31 23:19:21,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.49 | bwd_microstep: 5061.93 | bwd_inner_microstep: 4670.58 | bwd_allreduce_microstep: 391.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 23:19:29,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.89 | bwd_microstep: 5230.80 | bwd_inner_microstep: 4826.09 | bwd_allreduce_microstep: 404.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-31 23:19:38,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.12 | bwd_microstep: 5123.26 | bwd_inner_microstep: 4726.36 | bwd_allreduce_microstep: 396.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-31 23:19:46,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.94 | bwd_microstep: 4788.88 | bwd_inner_microstep: 4768.78 | bwd_allreduce_microstep: 20.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-31 23:19:55,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-31 23:19:55,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.88 | bwd_microstep: 5111.14 | bwd_inner_microstep: 5065.46 | bwd_allreduce_microstep: 45.62 | step_microstep: 181.82 [2024-07-31 23:19:55,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27943.59 | bwd: 40366.43 | bwd_inner: 38645.86 | bwd_allreduce: 1720.09 | step: 182.40 87%|████████▋ | 1075/1230 [21:08:01<2:58:52, 69.24s/it] {'loss': 1.1239, 'learning_rate': 8.215127526333499e-07, 'epoch': 0.87} 87%|████████▋ | 1075/1230 [21:08:01<2:58:52, 69.24s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4007 [2024-07-31 23:20:04,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.19 | bwd_microstep: 5315.07 | bwd_inner_microstep: 5273.15 | bwd_allreduce_microstep: 41.85 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1276 [2024-07-31 23:20:13,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.48 | bwd_microstep: 5429.89 | bwd_inner_microstep: 5011.59 | bwd_allreduce_microstep: 418.23 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2032 [2024-07-31 23:20:22,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.07 | bwd_microstep: 5144.89 | bwd_inner_microstep: 4747.34 | bwd_allreduce_microstep: 397.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-31 23:20:30,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.19 | bwd_microstep: 4985.29 | bwd_inner_microstep: 4965.89 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 23:20:39,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.01 | bwd_microstep: 5163.52 | bwd_inner_microstep: 5083.17 | bwd_allreduce_microstep: 80.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 23:20:48,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.31 | bwd_microstep: 5179.75 | bwd_inner_microstep: 5103.13 | bwd_allreduce_microstep: 76.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3677 [2024-07-31 23:20:57,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.19 | bwd_microstep: 5046.95 | bwd_inner_microstep: 4973.32 | bwd_allreduce_microstep: 73.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-31 23:21:06,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 23:21:06,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.67 | bwd_microstep: 5031.39 | bwd_inner_microstep: 4976.90 | bwd_allreduce_microstep: 54.42 | step_microstep: 181.97 [2024-07-31 23:21:06,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28897.00 | bwd: 41296.72 | bwd_inner: 40134.42 | bwd_allreduce: 1161.80 | step: 182.54 87%|████████▋ | 1076/1230 [21:09:11<2:58:42, 69.62s/it] {'loss': 1.1659, 'learning_rate': 8.110920028753355e-07, 'epoch': 0.87} 87%|████████▋ | 1076/1230 [21:09:11<2:58:42, 69.62s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-31 23:21:15,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.38 | bwd_microstep: 5591.91 | bwd_inner_microstep: 5433.14 | bwd_allreduce_microstep: 158.71 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2037 [2024-07-31 23:21:24,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.28 | bwd_microstep: 5224.21 | bwd_inner_microstep: 4820.20 | bwd_allreduce_microstep: 403.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3802 [2024-07-31 23:21:32,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.93 | bwd_microstep: 4991.89 | bwd_inner_microstep: 4959.72 | bwd_allreduce_microstep: 32.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-31 23:21:40,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3244.09 | bwd_microstep: 4866.06 | bwd_inner_microstep: 4837.79 | bwd_allreduce_microstep: 28.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3741 [2024-07-31 23:21:49,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.49 | bwd_microstep: 5168.97 | bwd_inner_microstep: 5082.56 | bwd_allreduce_microstep: 86.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-31 23:21:58,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.09 | bwd_microstep: 5129.83 | bwd_inner_microstep: 5073.97 | bwd_allreduce_microstep: 55.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-31 23:22:07,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.35 | bwd_microstep: 5151.60 | bwd_inner_microstep: 5083.37 | bwd_allreduce_microstep: 68.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3678 [2024-07-31 23:22:16,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-31 23:22:16,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.31 | bwd_microstep: 4875.03 | bwd_inner_microstep: 4855.63 | bwd_allreduce_microstep: 19.32 | step_microstep: 181.17 [2024-07-31 23:22:16,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28722.82 | bwd: 40999.48 | bwd_inner: 40146.32 | bwd_allreduce: 852.66 | step: 181.73 88%|████████▊ | 1077/1230 [21:10:21<2:57:52, 69.75s/it] {'loss': 1.0606, 'learning_rate': 8.007349741066939e-07, 'epoch': 0.88} 88%|████████▊ | 1077/1230 [21:10:21<2:57:52, 69.75s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-31 23:22:25,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.84 | bwd_microstep: 5236.75 | bwd_inner_microstep: 5184.21 | bwd_allreduce_microstep: 52.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3566 [2024-07-31 23:22:33,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3342.06 | bwd_microstep: 5022.87 | bwd_inner_microstep: 4960.37 | bwd_allreduce_microstep: 62.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-31 23:22:42,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.61 | bwd_microstep: 5109.02 | bwd_inner_microstep: 5036.41 | bwd_allreduce_microstep: 72.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 23:22:49,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3005.20 | bwd_microstep: 4843.73 | bwd_inner_microstep: 4469.19 | bwd_allreduce_microstep: 374.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3770 [2024-07-31 23:22:58,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.07 | bwd_microstep: 5028.33 | bwd_inner_microstep: 4992.26 | bwd_allreduce_microstep: 36.00 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 23:23:07,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.70 | bwd_microstep: 5003.15 | bwd_inner_microstep: 4951.18 | bwd_allreduce_microstep: 51.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-31 23:23:15,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.25 | bwd_microstep: 4880.19 | bwd_inner_microstep: 4860.84 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-31 23:23:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-31 23:23:24,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.32 | bwd_microstep: 5222.12 | bwd_inner_microstep: 4817.00 | bwd_allreduce_microstep: 405.05 | step_microstep: 182.21 [2024-07-31 23:23:24,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27986.95 | bwd: 40346.15 | bwd_inner: 39271.39 | bwd_allreduce: 1074.28 | step: 182.79 88%|████████▊ | 1078/1230 [21:11:30<2:55:52, 69.43s/it] {'loss': 1.1624, 'learning_rate': 7.904417381488072e-07, 'epoch': 0.88} 88%|████████▊ | 1078/1230 [21:11:30<2:55:52, 69.43s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3901 [2024-07-31 23:23:34,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3855.36 | bwd_microstep: 5420.20 | bwd_inner_microstep: 5365.47 | bwd_allreduce_microstep: 54.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3781 [2024-07-31 23:23:43,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.78 | bwd_microstep: 5522.97 | bwd_inner_microstep: 5427.56 | bwd_allreduce_microstep: 95.35 | step_microstep: 0.09 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2862 [2024-07-31 23:23:52,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.87 | bwd_microstep: 5223.59 | bwd_inner_microstep: 4817.30 | bwd_allreduce_microstep: 406.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-31 23:24:00,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.18 | bwd_microstep: 5185.05 | bwd_inner_microstep: 5124.33 | bwd_allreduce_microstep: 60.65 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-31 23:24:09,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.89 | bwd_microstep: 4997.76 | bwd_inner_microstep: 4978.50 | bwd_allreduce_microstep: 19.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3794 [2024-07-31 23:24:18,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.12 | bwd_microstep: 5030.87 | bwd_inner_microstep: 5011.48 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-31 23:24:27,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.16 | bwd_microstep: 5041.13 | bwd_inner_microstep: 4984.13 | bwd_allreduce_microstep: 56.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3670 [2024-07-31 23:24:35,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 23:24:35,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.97 | bwd_microstep: 4971.90 | bwd_inner_microstep: 4907.58 | bwd_allreduce_microstep: 64.26 | step_microstep: 181.28 [2024-07-31 23:24:35,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29341.24 | bwd: 41393.45 | bwd_inner: 40616.29 | bwd_allreduce: 776.66 | step: 181.96 88%|████████▊ | 1079/1230 [21:12:41<2:55:57, 69.92s/it] {'loss': 1.0982, 'learning_rate': 7.802123663806938e-07, 'epoch': 0.88} 88%|████████▊ | 1079/1230 [21:12:41<2:55:57, 69.92s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2372 [2024-07-31 23:24:44,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.90 | bwd_microstep: 5335.52 | bwd_inner_microstep: 4922.99 | bwd_allreduce_microstep: 412.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3564 [2024-07-31 23:24:53,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.11 | bwd_microstep: 5153.73 | bwd_inner_microstep: 5065.94 | bwd_allreduce_microstep: 87.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-07-31 23:25:02,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.08 | bwd_microstep: 5015.98 | bwd_inner_microstep: 4996.61 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-31 23:25:11,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.72 | bwd_microstep: 5207.22 | bwd_inner_microstep: 5145.62 | bwd_allreduce_microstep: 61.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-31 23:25:20,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.00 | bwd_microstep: 5052.92 | bwd_inner_microstep: 5010.84 | bwd_allreduce_microstep: 42.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-31 23:25:28,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.04 | bwd_microstep: 5110.76 | bwd_inner_microstep: 5062.61 | bwd_allreduce_microstep: 48.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-31 23:25:37,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.70 | bwd_microstep: 5104.35 | bwd_inner_microstep: 5038.97 | bwd_allreduce_microstep: 65.32 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2972 [2024-07-31 23:25:46,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 23:25:46,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.81 | bwd_microstep: 5046.11 | bwd_inner_microstep: 4781.77 | bwd_allreduce_microstep: 264.27 | step_microstep: 182.14 [2024-07-31 23:25:46,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29039.29 | bwd: 41026.58 | bwd_inner: 40025.29 | bwd_allreduce: 1000.80 | step: 182.71 88%|████████▊ | 1080/1230 [21:13:52<2:55:09, 70.06s/it] {'loss': 1.1544, 'learning_rate': 7.700469297384927e-07, 'epoch': 0.88} 88%|████████▊ | 1080/1230 [21:13:52<2:55:09, 70.06s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3885 [2024-07-31 23:25:55,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.55 | bwd_microstep: 5150.53 | bwd_inner_microstep: 5109.03 | bwd_allreduce_microstep: 41.43 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2047 [2024-07-31 23:26:03,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.25 | bwd_microstep: 5209.22 | bwd_inner_microstep: 4807.09 | bwd_allreduce_microstep: 402.06 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2236 [2024-07-31 23:26:12,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.31 | bwd_microstep: 5188.96 | bwd_inner_microstep: 4784.91 | bwd_allreduce_microstep: 403.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-31 23:26:21,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.67 | bwd_microstep: 5177.12 | bwd_inner_microstep: 4773.16 | bwd_allreduce_microstep: 403.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3798 [2024-07-31 23:26:30,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.73 | bwd_microstep: 5016.50 | bwd_inner_microstep: 4997.20 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-31 23:26:38,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.65 | bwd_microstep: 5130.83 | bwd_inner_microstep: 5078.21 | bwd_allreduce_microstep: 52.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-31 23:26:47,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.96 | bwd_microstep: 4911.58 | bwd_inner_microstep: 4886.01 | bwd_allreduce_microstep: 25.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-31 23:26:56,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 23:26:56,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.33 | bwd_microstep: 5033.48 | bwd_inner_microstep: 4979.38 | bwd_allreduce_microstep: 54.03 | step_microstep: 181.10 [2024-07-31 23:26:56,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28857.36 | bwd: 40818.19 | bwd_inner: 39414.92 | bwd_allreduce: 1402.79 | step: 181.68 88%|████████▊ | 1081/1230 [21:15:02<2:53:56, 70.04s/it] {'loss': 1.1848, 'learning_rate': 7.599454987149879e-07, 'epoch': 0.88} 88%|████████▊ | 1081/1230 [21:15:02<2:53:56, 70.04s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3555 [2024-07-31 23:27:05,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.87 | bwd_microstep: 5328.52 | bwd_inner_microstep: 5162.49 | bwd_allreduce_microstep: 165.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2233 [2024-07-31 23:27:13,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.83 | bwd_microstep: 5154.54 | bwd_inner_microstep: 4752.82 | bwd_allreduce_microstep: 401.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-07-31 23:27:22,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.77 | bwd_microstep: 4927.46 | bwd_inner_microstep: 4870.07 | bwd_allreduce_microstep: 57.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-07-31 23:27:30,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.07 | bwd_microstep: 5025.57 | bwd_inner_microstep: 5006.27 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-31 23:27:39,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.43 | bwd_microstep: 4977.36 | bwd_inner_microstep: 4958.03 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-31 23:27:48,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.49 | bwd_microstep: 5187.73 | bwd_inner_microstep: 4785.41 | bwd_allreduce_microstep: 402.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-31 23:27:56,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.02 | bwd_microstep: 5013.43 | bwd_inner_microstep: 4956.34 | bwd_allreduce_microstep: 57.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-31 23:28:05,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 23:28:05,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3201.50 | bwd_microstep: 4717.30 | bwd_inner_microstep: 4692.50 | bwd_allreduce_microstep: 24.74 | step_microstep: 182.65 [2024-07-31 23:28:05,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28178.88 | bwd: 40331.89 | bwd_inner: 39183.87 | bwd_allreduce: 1147.54 | step: 183.23 88%|████████▊ | 1082/1230 [21:16:10<2:51:53, 69.68s/it] {'loss': 1.1559, 'learning_rate': 7.499081433591049e-07, 'epoch': 0.88} 88%|████████▊ | 1082/1230 [21:16:10<2:51:53, 69.68s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2259 [2024-07-31 23:28:14,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.73 | bwd_microstep: 5403.62 | bwd_inner_microstep: 4988.34 | bwd_allreduce_microstep: 415.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2344 [2024-07-31 23:28:22,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.89 | bwd_microstep: 5152.50 | bwd_inner_microstep: 4749.89 | bwd_allreduce_microstep: 402.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 23:28:31,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.39 | bwd_microstep: 5182.84 | bwd_inner_microstep: 5107.72 | bwd_allreduce_microstep: 75.05 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-31 23:28:40,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.72 | bwd_microstep: 5205.44 | bwd_inner_microstep: 4798.43 | bwd_allreduce_microstep: 406.94 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2186 [2024-07-31 23:28:48,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3062.05 | bwd_microstep: 5019.34 | bwd_inner_microstep: 4630.85 | bwd_allreduce_microstep: 388.42 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2150 [2024-07-31 23:28:57,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.47 | bwd_microstep: 5067.32 | bwd_inner_microstep: 4674.38 | bwd_allreduce_microstep: 392.87 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2094 [2024-07-31 23:29:05,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3031.29 | bwd_microstep: 4998.85 | bwd_inner_microstep: 4616.54 | bwd_allreduce_microstep: 382.23 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3674 [2024-07-31 23:29:14,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 23:29:14,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.34 | bwd_microstep: 5163.86 | bwd_inner_microstep: 5083.72 | bwd_allreduce_microstep: 80.07 | step_microstep: 182.11 [2024-07-31 23:29:14,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27476.78 | bwd: 41193.74 | bwd_inner: 38649.81 | bwd_allreduce: 2543.43 | step: 182.72 88%|████████▊ | 1083/1230 [21:17:19<2:50:13, 69.48s/it] {'loss': 1.0858, 'learning_rate': 7.399349332754458e-07, 'epoch': 0.88} 88%|████████▊ | 1083/1230 [21:17:19<2:50:13, 69.48s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4015 [2024-07-31 23:29:23,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3846.19 | bwd_microstep: 5255.20 | bwd_inner_microstep: 5236.02 | bwd_allreduce_microstep: 19.11 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3572 [2024-07-31 23:29:32,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.32 | bwd_microstep: 5329.91 | bwd_inner_microstep: 5202.30 | bwd_allreduce_microstep: 127.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-07-31 23:29:41,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3800.87 | bwd_microstep: 5147.99 | bwd_inner_microstep: 5128.68 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1059 [2024-07-31 23:29:49,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.87 | bwd_microstep: 5198.13 | bwd_inner_microstep: 4797.07 | bwd_allreduce_microstep: 401.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3756 [2024-07-31 23:29:58,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.74 | bwd_microstep: 5044.11 | bwd_inner_microstep: 5003.36 | bwd_allreduce_microstep: 40.69 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-31 23:30:07,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3671.08 | bwd_microstep: 4883.26 | bwd_inner_microstep: 4863.99 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-31 23:30:15,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.27 | bwd_microstep: 4912.65 | bwd_inner_microstep: 4887.80 | bwd_allreduce_microstep: 24.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3659 [2024-07-31 23:30:24,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-31 23:30:24,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.32 | bwd_microstep: 4901.70 | bwd_inner_microstep: 4877.01 | bwd_allreduce_microstep: 24.62 | step_microstep: 181.84 [2024-07-31 23:30:24,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29432.56 | bwd: 40672.94 | bwd_inner: 39996.17 | bwd_allreduce: 676.28 | step: 182.52 88%|████████▊ | 1084/1230 [21:18:30<2:49:45, 69.77s/it] {'loss': 1.1171, 'learning_rate': 7.300259376237795e-07, 'epoch': 0.88} 88%|████████▊ | 1084/1230 [21:18:30<2:49:45, 69.77s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3823 [2024-07-31 23:30:33,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.53 | bwd_microstep: 5274.30 | bwd_inner_microstep: 5210.23 | bwd_allreduce_microstep: 64.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2301 [2024-07-31 23:30:42,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.72 | bwd_microstep: 5239.92 | bwd_inner_microstep: 4833.54 | bwd_allreduce_microstep: 406.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3597 [2024-07-31 23:30:51,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.25 | bwd_microstep: 5196.74 | bwd_inner_microstep: 5102.15 | bwd_allreduce_microstep: 94.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3673 [2024-07-31 23:30:59,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.03 | bwd_microstep: 4924.20 | bwd_inner_microstep: 4897.29 | bwd_allreduce_microstep: 26.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-31 23:31:07,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.87 | bwd_microstep: 4927.64 | bwd_inner_microstep: 4546.05 | bwd_allreduce_microstep: 381.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-31 23:31:16,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.78 | bwd_microstep: 4887.18 | bwd_inner_microstep: 4867.90 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-31 23:31:24,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3430.32 | bwd_microstep: 5021.71 | bwd_inner_microstep: 4634.04 | bwd_allreduce_microstep: 387.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2138 [2024-07-31 23:31:33,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 23:31:33,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3452.30 | bwd_microstep: 5048.53 | bwd_inner_microstep: 4658.30 | bwd_allreduce_microstep: 390.17 | step_microstep: 181.25 [2024-07-31 23:31:33,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28273.70 | bwd: 40520.20 | bwd_inner: 38749.44 | bwd_allreduce: 1770.27 | step: 181.85 88%|████████▊ | 1085/1230 [21:19:39<2:48:07, 69.57s/it] {'loss': 1.1894, 'learning_rate': 7.201812251185847e-07, 'epoch': 0.88} 88%|████████▊ | 1085/1230 [21:19:39<2:48:07, 69.57s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2095 [2024-07-31 23:31:42,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.25 | bwd_microstep: 5602.33 | bwd_inner_microstep: 5170.04 | bwd_allreduce_microstep: 432.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-31 23:31:51,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.93 | bwd_microstep: 5175.92 | bwd_inner_microstep: 5093.39 | bwd_allreduce_microstep: 82.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 23:32:00,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.13 | bwd_microstep: 5042.29 | bwd_inner_microstep: 4978.60 | bwd_allreduce_microstep: 63.62 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2227 [2024-07-31 23:32:09,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.84 | bwd_microstep: 5226.82 | bwd_inner_microstep: 4822.24 | bwd_allreduce_microstep: 404.51 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3732 [2024-07-31 23:32:17,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.32 | bwd_microstep: 4964.44 | bwd_inner_microstep: 4922.99 | bwd_allreduce_microstep: 41.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-31 23:32:25,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.30 | bwd_microstep: 4826.63 | bwd_inner_microstep: 4807.24 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-31 23:32:34,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.09 | bwd_microstep: 5077.18 | bwd_inner_microstep: 5017.18 | bwd_allreduce_microstep: 59.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-31 23:32:43,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 23:32:43,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.28 | bwd_microstep: 4919.79 | bwd_inner_microstep: 4899.17 | bwd_allreduce_microstep: 20.56 | step_microstep: 181.35 [2024-07-31 23:32:43,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28486.06 | bwd: 40835.39 | bwd_inner: 39710.79 | bwd_allreduce: 1124.12 | step: 181.93 88%|████████▊ | 1086/1230 [21:20:49<2:47:01, 69.60s/it] {'loss': 1.1294, 'learning_rate': 7.104008640285642e-07, 'epoch': 0.88} 88%|████████▊ | 1086/1230 [21:20:49<2:47:01, 69.60s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2408 [2024-07-31 23:32:52,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.44 | bwd_microstep: 5409.43 | bwd_inner_microstep: 4993.46 | bwd_allreduce_microstep: 415.90 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3797 [2024-07-31 23:33:01,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.52 | bwd_microstep: 5355.79 | bwd_inner_microstep: 5293.31 | bwd_allreduce_microstep: 62.42 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3628 [2024-07-31 23:33:10,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.28 | bwd_microstep: 5152.10 | bwd_inner_microstep: 5073.96 | bwd_allreduce_microstep: 78.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-31 23:33:18,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.12 | bwd_microstep: 5161.84 | bwd_inner_microstep: 5085.89 | bwd_allreduce_microstep: 75.88 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3739 [2024-07-31 23:33:27,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.43 | bwd_microstep: 5183.91 | bwd_inner_microstep: 5092.84 | bwd_allreduce_microstep: 91.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-31 23:33:36,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.96 | bwd_microstep: 5115.12 | bwd_inner_microstep: 5050.46 | bwd_allreduce_microstep: 64.60 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3678 [2024-07-31 23:33:45,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.64 | bwd_microstep: 4919.05 | bwd_inner_microstep: 4897.38 | bwd_allreduce_microstep: 21.61 | step_microstep: 0.10 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2130 [2024-07-31 23:33:54,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-31 23:33:54,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.26 | bwd_microstep: 5152.21 | bwd_inner_microstep: 4752.35 | bwd_allreduce_microstep: 399.79 | step_microstep: 182.01 [2024-07-31 23:33:54,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28997.56 | bwd: 41449.45 | bwd_inner: 40239.59 | bwd_allreduce: 1209.36 | step: 182.62 88%|████████▊ | 1087/1230 [21:21:59<2:46:42, 69.95s/it] {'loss': 1.1218, 'learning_rate': 7.006849221761736e-07, 'epoch': 0.88} 88%|████████▊ | 1087/1230 [21:21:59<2:46:42, 69.95s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 23:34:03,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.92 | bwd_microstep: 5318.26 | bwd_inner_microstep: 5291.35 | bwd_allreduce_microstep: 26.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2306 [2024-07-31 23:34:11,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.86 | bwd_microstep: 5178.39 | bwd_inner_microstep: 4774.44 | bwd_allreduce_microstep: 403.89 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-31 23:34:20,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.19 | bwd_microstep: 5234.42 | bwd_inner_microstep: 5145.62 | bwd_allreduce_microstep: 88.73 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 23:34:29,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.94 | bwd_microstep: 5071.01 | bwd_inner_microstep: 5040.64 | bwd_allreduce_microstep: 30.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-31 23:34:38,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.26 | bwd_microstep: 4972.72 | bwd_inner_microstep: 4953.36 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2117 [2024-07-31 23:34:46,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.10 | bwd_microstep: 5087.12 | bwd_inner_microstep: 4693.37 | bwd_allreduce_microstep: 393.68 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-31 23:34:55,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.09 | bwd_microstep: 5034.06 | bwd_inner_microstep: 4979.44 | bwd_allreduce_microstep: 54.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2136 [2024-07-31 23:35:04,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 23:35:04,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.37 | bwd_microstep: 5234.19 | bwd_inner_microstep: 4826.87 | bwd_allreduce_microstep: 407.26 | step_microstep: 183.11 [2024-07-31 23:35:04,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28993.63 | bwd: 41130.16 | bwd_inner: 39705.01 | bwd_allreduce: 1424.66 | step: 183.81 88%|████████▊ | 1088/1230 [21:23:10<2:45:54, 70.10s/it] {'loss': 1.1276, 'learning_rate': 6.910334669371433e-07, 'epoch': 0.88} 88%|████████▊ | 1088/1230 [21:23:10<2:45:54, 70.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4007 [2024-07-31 23:35:14,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.86 | bwd_microstep: 5929.95 | bwd_inner_microstep: 5550.29 | bwd_allreduce_microstep: 379.60 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3885 [2024-07-31 23:35:22,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.27 | bwd_microstep: 5208.27 | bwd_inner_microstep: 5162.41 | bwd_allreduce_microstep: 45.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-31 23:35:31,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3783.50 | bwd_microstep: 5053.86 | bwd_inner_microstep: 5027.13 | bwd_allreduce_microstep: 26.66 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3734 [2024-07-31 23:35:40,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.91 | bwd_microstep: 5114.26 | bwd_inner_microstep: 5056.01 | bwd_allreduce_microstep: 58.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2218 [2024-07-31 23:35:49,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.12 | bwd_microstep: 5184.34 | bwd_inner_microstep: 4784.20 | bwd_allreduce_microstep: 400.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-31 23:35:57,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.91 | bwd_microstep: 4968.62 | bwd_inner_microstep: 4914.31 | bwd_allreduce_microstep: 54.25 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2149 [2024-07-31 23:36:06,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.35 | bwd_microstep: 5125.89 | bwd_inner_microstep: 4727.42 | bwd_allreduce_microstep: 398.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-31 23:36:15,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-31 23:36:15,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.62 | bwd_microstep: 5055.82 | bwd_inner_microstep: 4995.96 | bwd_allreduce_microstep: 59.79 | step_microstep: 182.95 [2024-07-31 23:36:15,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28780.44 | bwd: 41641.00 | bwd_inner: 40217.68 | bwd_allreduce: 1422.84 | step: 183.54 89%|████████▊ | 1089/1230 [21:24:21<2:45:11, 70.30s/it] {'loss': 1.1393, 'learning_rate': 6.814465652400237e-07, 'epoch': 0.89} 89%|████████▊ | 1089/1230 [21:24:21<2:45:11, 70.30s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2801 [2024-07-31 23:36:24,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.17 | bwd_microstep: 5329.58 | bwd_inner_microstep: 4921.63 | bwd_allreduce_microstep: 407.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2339 [2024-07-31 23:36:33,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.38 | bwd_microstep: 5230.13 | bwd_inner_microstep: 4824.10 | bwd_allreduce_microstep: 405.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3857 [2024-07-31 23:36:41,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.40 | bwd_microstep: 5101.55 | bwd_inner_microstep: 5081.04 | bwd_allreduce_microstep: 20.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-31 23:36:50,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.19 | bwd_microstep: 4998.69 | bwd_inner_microstep: 4979.36 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-31 23:36:59,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.25 | bwd_microstep: 5020.31 | bwd_inner_microstep: 4967.35 | bwd_allreduce_microstep: 52.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-31 23:37:08,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.98 | bwd_microstep: 5167.58 | bwd_inner_microstep: 5090.96 | bwd_allreduce_microstep: 76.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3719 [2024-07-31 23:37:16,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.01 | bwd_microstep: 5040.92 | bwd_inner_microstep: 4977.34 | bwd_allreduce_microstep: 63.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 23:37:25,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 23:37:25,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.60 | bwd_microstep: 4998.48 | bwd_inner_microstep: 4944.88 | bwd_allreduce_microstep: 53.53 | step_microstep: 181.82 [2024-07-31 23:37:25,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28928.89 | bwd: 40887.22 | bwd_inner: 39786.59 | bwd_allreduce: 1100.14 | step: 182.38 89%|████████▊ | 1090/1230 [21:25:31<2:43:55, 70.25s/it] {'loss': 1.1422, 'learning_rate': 6.719242835657125e-07, 'epoch': 0.89} 89%|████████▊ | 1090/1230 [21:25:31<2:43:55, 70.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 23:37:33,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3374.15 | bwd_microstep: 5171.06 | bwd_inner_microstep: 5151.96 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2317 [2024-07-31 23:37:42,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3069.36 | bwd_microstep: 5024.14 | bwd_inner_microstep: 4639.00 | bwd_allreduce_microstep: 385.07 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2824 [2024-07-31 23:37:50,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.39 | bwd_microstep: 5267.52 | bwd_inner_microstep: 4858.54 | bwd_allreduce_microstep: 408.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-31 23:37:59,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.95 | bwd_microstep: 5140.82 | bwd_inner_microstep: 5071.30 | bwd_allreduce_microstep: 69.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-31 23:38:08,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.80 | bwd_microstep: 5189.26 | bwd_inner_microstep: 5132.46 | bwd_allreduce_microstep: 56.72 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3757 [2024-07-31 23:38:17,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.91 | bwd_microstep: 5140.48 | bwd_inner_microstep: 5079.11 | bwd_allreduce_microstep: 61.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-31 23:38:26,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.32 | bwd_microstep: 5001.95 | bwd_inner_microstep: 4982.46 | bwd_allreduce_microstep: 19.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 23:38:34,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-31 23:38:34,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.04 | bwd_microstep: 5067.11 | bwd_inner_microstep: 5001.01 | bwd_allreduce_microstep: 66.03 | step_microstep: 183.05 [2024-07-31 23:38:34,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28161.83 | bwd: 41002.31 | bwd_inner: 39915.77 | bwd_allreduce: 1086.04 | step: 183.63 89%|████████▊ | 1091/1230 [21:26:40<2:42:13, 70.02s/it] {'loss': 1.1709, 'learning_rate': 6.62466687947001e-07, 'epoch': 0.89} 89%|████████▊ | 1091/1230 [21:26:40<2:42:13, 70.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 23:38:43,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3385.89 | bwd_microstep: 5239.30 | bwd_inner_microstep: 5211.33 | bwd_allreduce_microstep: 27.91 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3831 [2024-07-31 23:38:52,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.76 | bwd_microstep: 5332.13 | bwd_inner_microstep: 5265.23 | bwd_allreduce_microstep: 66.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3819 [2024-07-31 23:39:00,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3253.00 | bwd_microstep: 4924.48 | bwd_inner_microstep: 4896.22 | bwd_allreduce_microstep: 28.20 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2844 [2024-07-31 23:39:09,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.39 | bwd_microstep: 5212.61 | bwd_inner_microstep: 4805.25 | bwd_allreduce_microstep: 407.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3648 [2024-07-31 23:39:18,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.50 | bwd_microstep: 5042.01 | bwd_inner_microstep: 4962.56 | bwd_allreduce_microstep: 79.38 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-31 23:39:26,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.97 | bwd_microstep: 5185.77 | bwd_inner_microstep: 4782.82 | bwd_allreduce_microstep: 402.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-31 23:39:35,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.77 | bwd_microstep: 5187.98 | bwd_inner_microstep: 4786.20 | bwd_allreduce_microstep: 401.71 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2168 [2024-07-31 23:39:44,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 23:39:44,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.04 | bwd_microstep: 5069.91 | bwd_inner_microstep: 4677.74 | bwd_allreduce_microstep: 392.10 | step_microstep: 181.98 [2024-07-31 23:39:44,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27987.22 | bwd: 41194.18 | bwd_inner: 39387.29 | bwd_allreduce: 1806.40 | step: 182.66 89%|████████▉ | 1092/1230 [21:27:50<2:40:42, 69.87s/it] {'loss': 1.1388, 'learning_rate': 6.53073843968104e-07, 'epoch': 0.89} 89%|████████▉ | 1092/1230 [21:27:50<2:40:42, 69.87s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3885 [2024-07-31 23:39:53,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3443.69 | bwd_microstep: 5120.16 | bwd_inner_microstep: 5088.02 | bwd_allreduce_microstep: 32.06 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2337 [2024-07-31 23:40:02,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.72 | bwd_microstep: 5493.05 | bwd_inner_microstep: 5070.00 | bwd_allreduce_microstep: 422.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3779 [2024-07-31 23:40:10,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.36 | bwd_microstep: 5025.13 | bwd_inner_microstep: 5005.81 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3727 [2024-07-31 23:40:19,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.82 | bwd_microstep: 5072.51 | bwd_inner_microstep: 5039.64 | bwd_allreduce_microstep: 32.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-31 23:40:28,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.40 | bwd_microstep: 5017.64 | bwd_inner_microstep: 4998.36 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-31 23:40:37,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.40 | bwd_microstep: 4989.57 | bwd_inner_microstep: 4970.12 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-31 23:40:45,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.30 | bwd_microstep: 4994.18 | bwd_inner_microstep: 4944.73 | bwd_allreduce_microstep: 49.37 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3670 [2024-07-31 23:40:54,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-31 23:40:54,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.06 | bwd_microstep: 5026.60 | bwd_inner_microstep: 4955.15 | bwd_allreduce_microstep: 71.38 | step_microstep: 181.73 [2024-07-31 23:40:54,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29108.66 | bwd: 40738.82 | bwd_inner: 40071.78 | bwd_allreduce: 666.55 | step: 182.31 89%|████████▉ | 1093/1230 [21:29:00<2:39:45, 69.96s/it] {'loss': 1.0765, 'learning_rate': 6.437458167642164e-07, 'epoch': 0.89} 89%|████████▉ | 1093/1230 [21:29:00<2:39:45, 69.96s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 23:41:03,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.49 | bwd_microstep: 5164.56 | bwd_inner_microstep: 5145.38 | bwd_allreduce_microstep: 19.11 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2032 [2024-07-31 23:41:11,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3060.59 | bwd_microstep: 5096.95 | bwd_inner_microstep: 4704.55 | bwd_allreduce_microstep: 392.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 23:41:20,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.22 | bwd_microstep: 5371.80 | bwd_inner_microstep: 5292.77 | bwd_allreduce_microstep: 78.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3815 [2024-07-31 23:41:29,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3793.76 | bwd_microstep: 5181.89 | bwd_inner_microstep: 5146.14 | bwd_allreduce_microstep: 35.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-31 23:41:38,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.77 | bwd_microstep: 5131.08 | bwd_inner_microstep: 5058.37 | bwd_allreduce_microstep: 72.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-31 23:41:47,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.97 | bwd_microstep: 5097.76 | bwd_inner_microstep: 5034.38 | bwd_allreduce_microstep: 63.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-31 23:41:55,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.09 | bwd_microstep: 5193.17 | bwd_inner_microstep: 4788.56 | bwd_allreduce_microstep: 404.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-31 23:42:04,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.78 [2024-07-31 23:42:04,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.93 | bwd_microstep: 5000.47 | bwd_inner_microstep: 4981.13 | bwd_allreduce_microstep: 19.26 | step_microstep: 181.97 [2024-07-31 23:42:04,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28700.72 | bwd: 41237.65 | bwd_inner: 40151.22 | bwd_allreduce: 1085.94 | step: 182.55 89%|████████▉ | 1094/1230 [21:30:10<2:38:47, 70.06s/it] {'loss': 1.0951, 'learning_rate': 6.344826710210584e-07, 'epoch': 0.89} 89%|████████▉ | 1094/1230 [21:30:10<2:38:47, 70.06s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3948 [2024-07-31 23:42:13,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3817.62 | bwd_microstep: 5182.29 | bwd_inner_microstep: 5163.11 | bwd_allreduce_microstep: 19.12 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3570 [2024-07-31 23:42:22,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3131.79 | bwd_microstep: 5035.66 | bwd_inner_microstep: 4962.40 | bwd_allreduce_microstep: 73.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2251 [2024-07-31 23:42:29,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2993.13 | bwd_microstep: 4845.29 | bwd_inner_microstep: 4473.12 | bwd_allreduce_microstep: 372.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-31 23:42:38,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.93 | bwd_microstep: 5071.29 | bwd_inner_microstep: 5008.45 | bwd_allreduce_microstep: 62.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3896 [2024-07-31 23:42:47,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.73 | bwd_microstep: 5168.43 | bwd_inner_microstep: 5129.86 | bwd_allreduce_microstep: 38.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-31 23:42:56,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.43 | bwd_microstep: 5225.36 | bwd_inner_microstep: 5142.97 | bwd_allreduce_microstep: 82.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2140 [2024-07-31 23:43:05,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.38 | bwd_microstep: 5234.05 | bwd_inner_microstep: 4830.17 | bwd_allreduce_microstep: 403.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-31 23:43:14,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-31 23:43:14,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.66 | bwd_microstep: 5186.75 | bwd_inner_microstep: 5113.88 | bwd_allreduce_microstep: 72.80 | step_microstep: 180.84 [2024-07-31 23:43:14,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27930.56 | bwd: 40949.11 | bwd_inner: 39823.89 | bwd_allreduce: 1124.74 | step: 181.43 89%|████████▉ | 1095/1230 [21:31:19<2:37:03, 69.80s/it] {'loss': 1.101, 'learning_rate': 6.252844709744266e-07, 'epoch': 0.89} 89%|████████▉ | 1095/1230 [21:31:19<2:37:03, 69.80s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3973 [2024-07-31 23:43:23,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3878.55 | bwd_microstep: 5411.63 | bwd_inner_microstep: 5370.84 | bwd_allreduce_microstep: 40.73 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3878 [2024-07-31 23:43:32,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.58 | bwd_microstep: 5216.93 | bwd_inner_microstep: 5161.33 | bwd_allreduce_microstep: 55.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-31 23:43:41,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.26 | bwd_microstep: 5214.84 | bwd_inner_microstep: 5126.31 | bwd_allreduce_microstep: 88.45 | step_microstep: 0.11 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3772 [2024-07-31 23:43:50,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3791.78 | bwd_microstep: 5159.89 | bwd_inner_microstep: 5122.20 | bwd_allreduce_microstep: 37.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-31 23:43:58,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.51 | bwd_microstep: 5149.85 | bwd_inner_microstep: 5093.48 | bwd_allreduce_microstep: 56.31 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-31 23:44:07,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.69 | bwd_microstep: 5026.12 | bwd_inner_microstep: 4968.70 | bwd_allreduce_microstep: 57.35 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3657 [2024-07-31 23:44:15,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.72 | bwd_microstep: 4795.90 | bwd_inner_microstep: 4776.43 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3660 [2024-07-31 23:44:24,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-31 23:44:24,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.25 | bwd_microstep: 5033.62 | bwd_inner_microstep: 4955.08 | bwd_allreduce_microstep: 78.47 | step_microstep: 181.36 [2024-07-31 23:44:24,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29126.22 | bwd: 41008.78 | bwd_inner: 40574.31 | bwd_allreduce: 433.97 | step: 182.09 89%|████████▉ | 1096/1230 [21:32:30<2:36:20, 70.00s/it] {'loss': 1.0931, 'learning_rate': 6.161512804097414e-07, 'epoch': 0.89} 89%|████████▉ | 1096/1230 [21:32:30<2:36:20, 70.00s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3888 [2024-07-31 23:44:33,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.92 | bwd_microstep: 5285.92 | bwd_inner_microstep: 5233.29 | bwd_allreduce_microstep: 52.56 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3795 [2024-07-31 23:44:42,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.19 | bwd_microstep: 5282.26 | bwd_inner_microstep: 5218.54 | bwd_allreduce_microstep: 63.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3834 [2024-07-31 23:44:51,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.19 | bwd_microstep: 5167.50 | bwd_inner_microstep: 5118.13 | bwd_allreduce_microstep: 49.30 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3753 [2024-07-31 23:45:00,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.55 | bwd_microstep: 5017.33 | bwd_inner_microstep: 4997.63 | bwd_allreduce_microstep: 19.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3625 [2024-07-31 23:45:07,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2895.29 | bwd_microstep: 4522.23 | bwd_inner_microstep: 4502.78 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-31 23:45:16,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.46 | bwd_microstep: 5169.28 | bwd_inner_microstep: 4766.27 | bwd_allreduce_microstep: 402.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 23:45:25,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.24 | bwd_microstep: 5431.79 | bwd_inner_microstep: 4915.66 | bwd_allreduce_microstep: 516.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 23:45:33,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-31 23:45:33,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3225.41 | bwd_microstep: 4762.37 | bwd_inner_microstep: 4736.95 | bwd_allreduce_microstep: 25.36 | step_microstep: 182.29 [2024-07-31 23:45:33,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27867.14 | bwd: 40638.67 | bwd_inner: 39489.19 | bwd_allreduce: 1148.99 | step: 182.89 89%|████████▉ | 1097/1230 [21:33:39<2:34:23, 69.65s/it] {'loss': 1.1475, 'learning_rate': 6.070831626616236e-07, 'epoch': 0.89} 89%|████████▉ | 1097/1230 [21:33:39<2:34:23, 69.65s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3857 [2024-07-31 23:45:42,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3802.92 | bwd_microstep: 5122.01 | bwd_inner_microstep: 5102.84 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3255 [2024-07-31 23:45:51,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.56 | bwd_microstep: 5281.79 | bwd_inner_microstep: 5033.87 | bwd_allreduce_microstep: 247.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-31 23:46:00,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.10 | bwd_microstep: 5103.79 | bwd_inner_microstep: 5073.30 | bwd_allreduce_microstep: 30.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 23:46:08,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.37 | bwd_microstep: 4998.72 | bwd_inner_microstep: 4979.37 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3639 [2024-07-31 23:46:17,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.47 | bwd_microstep: 5116.25 | bwd_inner_microstep: 5035.35 | bwd_allreduce_microstep: 80.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3792 [2024-07-31 23:46:26,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.28 | bwd_microstep: 5176.92 | bwd_inner_microstep: 5129.44 | bwd_allreduce_microstep: 47.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3756 [2024-07-31 23:46:34,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3239.59 | bwd_microstep: 4811.58 | bwd_inner_microstep: 4792.29 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 23:46:42,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.77 [2024-07-31 23:46:42,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.79 | bwd_microstep: 4735.01 | bwd_inner_microstep: 4710.85 | bwd_allreduce_microstep: 24.09 | step_microstep: 182.84 [2024-07-31 23:46:42,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28571.98 | bwd: 40346.05 | bwd_inner: 39857.26 | bwd_allreduce: 488.29 | step: 183.42 89%|████████▉ | 1098/1230 [21:34:48<2:32:58, 69.53s/it] {'loss': 1.1936, 'learning_rate': 5.980801806134318e-07, 'epoch': 0.89} 89%|████████▉ | 1098/1230 [21:34:48<2:32:58, 69.53s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3885 [2024-07-31 23:46:51,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.64 | bwd_microstep: 5338.82 | bwd_inner_microstep: 5272.54 | bwd_allreduce_microstep: 66.21 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3575 [2024-07-31 23:47:00,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.06 | bwd_microstep: 5304.18 | bwd_inner_microstep: 5153.33 | bwd_allreduce_microstep: 150.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-31 23:47:09,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.63 | bwd_microstep: 5148.46 | bwd_inner_microstep: 5100.52 | bwd_allreduce_microstep: 47.88 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2077 [2024-07-31 23:47:18,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.23 | bwd_microstep: 5224.72 | bwd_inner_microstep: 4819.10 | bwd_allreduce_microstep: 405.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-31 23:47:26,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.74 | bwd_microstep: 5111.33 | bwd_inner_microstep: 5063.68 | bwd_allreduce_microstep: 47.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3813 [2024-07-31 23:47:35,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.25 | bwd_microstep: 5048.91 | bwd_inner_microstep: 5029.66 | bwd_allreduce_microstep: 19.18 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-31 23:47:43,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3209.79 | bwd_microstep: 4736.22 | bwd_inner_microstep: 4709.71 | bwd_allreduce_microstep: 26.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-31 23:47:52,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-31 23:47:52,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.53 | bwd_microstep: 4986.65 | bwd_inner_microstep: 4933.10 | bwd_allreduce_microstep: 53.48 | step_microstep: 181.41 [2024-07-31 23:47:52,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28563.78 | bwd: 40899.27 | bwd_inner: 40081.58 | bwd_allreduce: 817.21 | step: 182.00 89%|████████▉ | 1099/1230 [21:35:58<2:31:59, 69.61s/it] {'loss': 1.1667, 'learning_rate': 5.891423966968424e-07, 'epoch': 0.89} 89%|████████▉ | 1099/1230 [21:35:58<2:31:59, 69.61s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4054 [2024-07-31 23:48:01,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.00 | bwd_microstep: 5615.26 | bwd_inner_microstep: 5553.87 | bwd_allreduce_microstep: 61.33 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2365 [2024-07-31 23:48:10,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.20 | bwd_microstep: 5444.41 | bwd_inner_microstep: 5025.19 | bwd_allreduce_microstep: 419.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-31 23:48:19,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.32 | bwd_microstep: 4984.98 | bwd_inner_microstep: 4965.38 | bwd_allreduce_microstep: 19.51 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3770 [2024-07-31 23:48:28,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.46 | bwd_microstep: 5182.17 | bwd_inner_microstep: 5125.82 | bwd_allreduce_microstep: 56.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 23:48:37,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.72 | bwd_microstep: 5123.74 | bwd_inner_microstep: 5077.86 | bwd_allreduce_microstep: 45.81 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-31 23:48:46,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.05 | bwd_microstep: 5180.62 | bwd_inner_microstep: 5124.61 | bwd_allreduce_microstep: 55.93 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2186 [2024-07-31 23:48:54,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.60 | bwd_microstep: 5147.70 | bwd_inner_microstep: 4747.07 | bwd_allreduce_microstep: 400.57 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-31 23:49:03,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-31 23:49:03,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.30 | bwd_microstep: 4982.19 | bwd_inner_microstep: 4962.85 | bwd_allreduce_microstep: 19.26 | step_microstep: 181.53 [2024-07-31 23:49:03,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29226.55 | bwd: 41661.07 | bwd_inner: 40582.59 | bwd_allreduce: 1077.96 | step: 182.17 89%|████████▉ | 1100/1230 [21:37:09<2:31:52, 70.10s/it] {'loss': 1.1223, 'learning_rate': 5.80269872891408e-07, 'epoch': 0.89} 89%|████████▉ | 1100/1230 [21:37:09<2:31:52, 70.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-07-31 23:49:12,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.22 | bwd_microstep: 5187.11 | bwd_inner_microstep: 5168.12 | bwd_allreduce_microstep: 18.93 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2284 [2024-07-31 23:49:21,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.60 | bwd_microstep: 5334.06 | bwd_inner_microstep: 4918.45 | bwd_allreduce_microstep: 415.53 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3605 [2024-07-31 23:49:30,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.53 | bwd_microstep: 5255.99 | bwd_inner_microstep: 5155.99 | bwd_allreduce_microstep: 99.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2230 [2024-07-31 23:49:38,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3004.77 | bwd_microstep: 4919.87 | bwd_inner_microstep: 4540.27 | bwd_allreduce_microstep: 379.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-31 23:49:46,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3072.79 | bwd_microstep: 5035.23 | bwd_inner_microstep: 4648.28 | bwd_allreduce_microstep: 386.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3641 [2024-07-31 23:49:55,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.40 | bwd_microstep: 5056.62 | bwd_inner_microstep: 4974.63 | bwd_allreduce_microstep: 81.92 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-31 23:50:03,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.95 | bwd_microstep: 5227.15 | bwd_inner_microstep: 4822.51 | bwd_allreduce_microstep: 404.58 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2143 [2024-07-31 23:50:12,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-31 23:50:12,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.09 | bwd_microstep: 5098.10 | bwd_inner_microstep: 4702.81 | bwd_allreduce_microstep: 395.22 | step_microstep: 181.26 [2024-07-31 23:50:12,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27622.26 | bwd: 41114.12 | bwd_inner: 38931.00 | bwd_allreduce: 2182.63 | step: 181.87 90%|████████▉ | 1101/1230 [21:38:18<2:30:02, 69.79s/it] {'loss': 1.123, 'learning_rate': 5.714626707241411e-07, 'epoch': 0.9} 90%|████████▉ | 1101/1230 [21:38:18<2:30:02, 69.79s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3604 [2024-07-31 23:50:21,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.08 | bwd_microstep: 5410.73 | bwd_inner_microstep: 5245.96 | bwd_allreduce_microstep: 164.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3821 [2024-07-31 23:50:30,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3323.93 | bwd_microstep: 5124.64 | bwd_inner_microstep: 5073.38 | bwd_allreduce_microstep: 51.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-31 23:50:39,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.33 | bwd_microstep: 5041.49 | bwd_inner_microstep: 5018.70 | bwd_allreduce_microstep: 22.72 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2848 [2024-07-31 23:50:47,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.14 | bwd_microstep: 5146.27 | bwd_inner_microstep: 4745.18 | bwd_allreduce_microstep: 401.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-31 23:50:56,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.30 | bwd_microstep: 5004.03 | bwd_inner_microstep: 4984.06 | bwd_allreduce_microstep: 19.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-31 23:51:05,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.65 | bwd_microstep: 5100.49 | bwd_inner_microstep: 5051.55 | bwd_allreduce_microstep: 48.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 23:51:13,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.22 | bwd_microstep: 4970.13 | bwd_inner_microstep: 4923.13 | bwd_allreduce_microstep: 46.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3734 [2024-07-31 23:51:22,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-31 23:51:22,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.94 | bwd_microstep: 5202.84 | bwd_inner_microstep: 5142.12 | bwd_allreduce_microstep: 60.65 | step_microstep: 181.39 [2024-07-31 23:51:22,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28819.51 | bwd: 41000.59 | bwd_inner: 40184.04 | bwd_allreduce: 816.06 | step: 181.98 90%|████████▉ | 1102/1230 [21:39:28<2:29:06, 69.90s/it] {'loss': 1.1255, 'learning_rate': 5.627208512690641e-07, 'epoch': 0.9} 90%|████████▉ | 1102/1230 [21:39:28<2:29:06, 69.90s/it]dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3605 [2024-07-31 23:51:31,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.21 | bwd_microstep: 5279.12 | bwd_inner_microstep: 5192.58 | bwd_allreduce_microstep: 86.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-31 23:51:40,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3364.49 | bwd_microstep: 5091.93 | bwd_inner_microstep: 5050.91 | bwd_allreduce_microstep: 40.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-31 23:51:49,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.07 | bwd_microstep: 5001.18 | bwd_inner_microstep: 4978.82 | bwd_allreduce_microstep: 22.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3821 [2024-07-31 23:51:57,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.94 | bwd_microstep: 5046.10 | bwd_inner_microstep: 5026.72 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-31 23:52:06,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.11 | bwd_microstep: 5004.53 | bwd_inner_microstep: 4985.20 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3695 [2024-07-31 23:52:15,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.87 | bwd_microstep: 5041.10 | bwd_inner_microstep: 4976.56 | bwd_allreduce_microstep: 64.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-31 23:52:23,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.64 | bwd_microstep: 4902.64 | bwd_inner_microstep: 4882.59 | bwd_allreduce_microstep: 19.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-31 23:52:32,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-31 23:52:32,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.76 | bwd_microstep: 4957.36 | bwd_inner_microstep: 4914.17 | bwd_allreduce_microstep: 43.12 | step_microstep: 182.47 [2024-07-31 23:52:32,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29066.99 | bwd: 40323.95 | bwd_inner: 40007.50 | bwd_allreduce: 315.97 | step: 183.06 90%|████████▉ | 1103/1230 [21:40:38<2:27:50, 69.85s/it] {'loss': 1.1056, 'learning_rate': 5.540444751468122e-07, 'epoch': 0.9} 90%|████████▉ | 1103/1230 [21:40:38<2:27:50, 69.85s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2305 [2024-07-31 23:52:41,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.00 | bwd_microstep: 5635.53 | bwd_inner_microstep: 5204.04 | bwd_allreduce_microstep: 431.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-31 23:52:50,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.37 | bwd_microstep: 5136.87 | bwd_inner_microstep: 5064.70 | bwd_allreduce_microstep: 72.10 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3812 [2024-07-31 23:52:59,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.66 | bwd_microstep: 5126.20 | bwd_inner_microstep: 5066.50 | bwd_allreduce_microstep: 59.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-31 23:53:08,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.12 | bwd_microstep: 5281.04 | bwd_inner_microstep: 4871.44 | bwd_allreduce_microstep: 409.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-31 23:53:16,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.54 | bwd_microstep: 4997.14 | bwd_inner_microstep: 4940.17 | bwd_allreduce_microstep: 56.91 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-31 23:53:25,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.24 | bwd_microstep: 5208.30 | bwd_inner_microstep: 4803.71 | bwd_allreduce_microstep: 404.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-31 23:53:34,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.66 | bwd_microstep: 5185.11 | bwd_inner_microstep: 5104.72 | bwd_allreduce_microstep: 80.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3672 [2024-07-31 23:53:43,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 23:53:43,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.65 | bwd_microstep: 4874.31 | bwd_inner_microstep: 4854.21 | bwd_allreduce_microstep: 20.02 | step_microstep: 181.54 [2024-07-31 23:53:43,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28787.14 | bwd: 41444.49 | bwd_inner: 39909.45 | bwd_allreduce: 1534.56 | step: 182.12 90%|████████▉ | 1104/1230 [21:41:49<2:27:07, 70.06s/it] {'loss': 1.123, 'learning_rate': 5.454336025241869e-07, 'epoch': 0.9} 90%|████████▉ | 1104/1230 [21:41:49<2:27:07, 70.06s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2250 [2024-07-31 23:53:51,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3099.07 | bwd_microstep: 5192.27 | bwd_inner_microstep: 4797.59 | bwd_allreduce_microstep: 394.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-31 23:54:00,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.10 | bwd_microstep: 5325.44 | bwd_inner_microstep: 5225.97 | bwd_allreduce_microstep: 99.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2220 [2024-07-31 23:54:09,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.38 | bwd_microstep: 5187.99 | bwd_inner_microstep: 4784.77 | bwd_allreduce_microstep: 403.14 | step_microstep: 0.09 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1212 [2024-07-31 23:54:17,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3004.51 | bwd_microstep: 5008.78 | bwd_inner_microstep: 4625.17 | bwd_allreduce_microstep: 383.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2214 [2024-07-31 23:54:25,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.81 | bwd_microstep: 5171.77 | bwd_inner_microstep: 4767.74 | bwd_allreduce_microstep: 403.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-31 23:54:34,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.87 | bwd_microstep: 5047.79 | bwd_inner_microstep: 4988.13 | bwd_allreduce_microstep: 59.59 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-31 23:54:43,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.49 | bwd_microstep: 5013.46 | bwd_inner_microstep: 4957.46 | bwd_allreduce_microstep: 55.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-31 23:54:51,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-31 23:54:51,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3453.24 | bwd_microstep: 5009.22 | bwd_inner_microstep: 4619.31 | bwd_allreduce_microstep: 389.85 | step_microstep: 182.86 [2024-07-31 23:54:51,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27316.34 | bwd: 40956.71 | bwd_inner: 38766.08 | bwd_allreduce: 2190.14 | step: 183.45 90%|████████▉ | 1105/1230 [21:42:57<2:25:02, 69.62s/it] {'loss': 1.0953, 'learning_rate': 5.368882931137675e-07, 'epoch': 0.9} 90%|████████▉ | 1105/1230 [21:42:57<2:25:02, 69.62s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2307 [2024-07-31 23:55:00,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.75 | bwd_microstep: 5358.75 | bwd_inner_microstep: 4946.57 | bwd_allreduce_microstep: 412.11 | step_microstep: 0.08 dynamic ViT batch size: 5, images per sample: 2.5, dynamic token length: 1525 [2024-07-31 23:55:09,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.76 | bwd_microstep: 5312.11 | bwd_inner_microstep: 4901.47 | bwd_allreduce_microstep: 410.57 | step_microstep: 0.20 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-31 23:55:18,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.95 | bwd_microstep: 5355.14 | bwd_inner_microstep: 5252.07 | bwd_allreduce_microstep: 103.00 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2081 [2024-07-31 23:55:27,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.81 | bwd_microstep: 5182.42 | bwd_inner_microstep: 4778.52 | bwd_allreduce_microstep: 403.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 23:55:36,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.43 | bwd_microstep: 5150.75 | bwd_inner_microstep: 5069.87 | bwd_allreduce_microstep: 80.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-31 23:55:44,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.63 | bwd_microstep: 5209.04 | bwd_inner_microstep: 5127.52 | bwd_allreduce_microstep: 81.45 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1124 [2024-07-31 23:55:53,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.71 | bwd_microstep: 5264.41 | bwd_inner_microstep: 4858.42 | bwd_allreduce_microstep: 405.92 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2147 [2024-07-31 23:56:02,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-31 23:56:02,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.55 | bwd_microstep: 5264.02 | bwd_inner_microstep: 4855.75 | bwd_allreduce_microstep: 408.20 | step_microstep: 182.77 [2024-07-31 23:56:02,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28587.51 | bwd: 42096.62 | bwd_inner: 39790.13 | bwd_allreduce: 2306.00 | step: 183.47 90%|████████▉ | 1106/1230 [21:44:08<2:24:44, 70.04s/it] {'loss': 1.1141, 'learning_rate': 5.284086061734672e-07, 'epoch': 0.9} 90%|████████▉ | 1106/1230 [21:44:08<2:24:44, 70.04s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4070 [2024-07-31 23:56:11,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3835.53 | bwd_microstep: 5328.38 | bwd_inner_microstep: 5309.23 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3948 [2024-07-31 23:56:20,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3804.83 | bwd_microstep: 5161.01 | bwd_inner_microstep: 5141.66 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3844 [2024-07-31 23:56:29,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.64 | bwd_microstep: 5095.06 | bwd_inner_microstep: 5075.62 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2222 [2024-07-31 23:56:38,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.37 | bwd_microstep: 5228.51 | bwd_inner_microstep: 4820.52 | bwd_allreduce_microstep: 407.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-31 23:56:46,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3324.69 | bwd_microstep: 4830.41 | bwd_inner_microstep: 4796.07 | bwd_allreduce_microstep: 34.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3823 [2024-07-31 23:56:55,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.32 | bwd_microstep: 5062.07 | bwd_inner_microstep: 5040.96 | bwd_allreduce_microstep: 21.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-31 23:57:04,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.14 | bwd_microstep: 5170.26 | bwd_inner_microstep: 4766.75 | bwd_allreduce_microstep: 403.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-31 23:57:13,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-31 23:57:13,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.00 | bwd_microstep: 5070.40 | bwd_inner_microstep: 4677.27 | bwd_allreduce_microstep: 393.07 | step_microstep: 182.51 [2024-07-31 23:57:13,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29100.44 | bwd: 40946.09 | bwd_inner: 39628.02 | bwd_allreduce: 1317.58 | step: 183.09 90%|█████████ | 1107/1230 [21:45:19<2:23:47, 70.14s/it] {'loss': 1.0916, 'learning_rate': 5.199946005061462e-07, 'epoch': 0.9} 90%|█████████ | 1107/1230 [21:45:19<2:23:47, 70.14s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2339 [2024-07-31 23:57:21,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3160.64 | bwd_microstep: 5363.08 | bwd_inner_microstep: 4951.99 | bwd_allreduce_microstep: 411.02 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3828 [2024-07-31 23:57:30,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.36 | bwd_microstep: 5121.63 | bwd_inner_microstep: 5060.08 | bwd_allreduce_microstep: 61.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3816 [2024-07-31 23:57:39,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.39 | bwd_microstep: 5183.95 | bwd_inner_microstep: 5134.88 | bwd_allreduce_microstep: 49.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-31 23:57:48,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.82 | bwd_microstep: 5154.69 | bwd_inner_microstep: 5098.56 | bwd_allreduce_microstep: 56.06 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2090 [2024-07-31 23:57:56,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3055.20 | bwd_microstep: 5036.08 | bwd_inner_microstep: 4648.57 | bwd_allreduce_microstep: 387.44 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2104 [2024-07-31 23:58:04,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.86 | bwd_microstep: 5250.56 | bwd_inner_microstep: 4843.04 | bwd_allreduce_microstep: 407.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-31 23:58:13,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.53 | bwd_microstep: 4888.22 | bwd_inner_microstep: 4868.88 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-31 23:58:22,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-31 23:58:22,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.72 | bwd_microstep: 5045.04 | bwd_inner_microstep: 4987.40 | bwd_allreduce_microstep: 57.58 | step_microstep: 182.78 [2024-07-31 23:58:22,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27835.41 | bwd: 41043.23 | bwd_inner: 39593.32 | bwd_allreduce: 1449.41 | step: 183.36 90%|█████████ | 1108/1230 [21:46:28<2:22:03, 69.86s/it] {'loss': 1.1653, 'learning_rate': 5.116463344591893e-07, 'epoch': 0.9} 90%|█████████ | 1108/1230 [21:46:28<2:22:03, 69.86s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-07-31 23:58:31,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3855.59 | bwd_microstep: 5348.47 | bwd_inner_microstep: 5329.41 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2220 [2024-07-31 23:58:40,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.63 | bwd_microstep: 5257.46 | bwd_inner_microstep: 4847.68 | bwd_allreduce_microstep: 409.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-31 23:58:49,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.54 | bwd_microstep: 5239.96 | bwd_inner_microstep: 4833.99 | bwd_allreduce_microstep: 405.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3780 [2024-07-31 23:58:58,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.04 | bwd_microstep: 5080.98 | bwd_inner_microstep: 5056.43 | bwd_allreduce_microstep: 24.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-31 23:59:06,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.47 | bwd_microstep: 5186.13 | bwd_inner_microstep: 5101.82 | bwd_allreduce_microstep: 84.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-31 23:59:15,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.42 | bwd_microstep: 5046.20 | bwd_inner_microstep: 4654.94 | bwd_allreduce_microstep: 391.20 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2121 [2024-07-31 23:59:24,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.75 | bwd_microstep: 5089.14 | bwd_inner_microstep: 4693.75 | bwd_allreduce_microstep: 395.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-31 23:59:32,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-31 23:59:32,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.53 | bwd_microstep: 5038.54 | bwd_inner_microstep: 4982.37 | bwd_allreduce_microstep: 56.11 | step_microstep: 181.25 [2024-07-31 23:59:32,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28920.88 | bwd: 41286.87 | bwd_inner: 39500.34 | bwd_allreduce: 1786.05 | step: 181.83 90%|█████████ | 1109/1230 [21:47:38<2:21:17, 70.07s/it] {'loss': 1.1423, 'learning_rate': 5.033638659241102e-07, 'epoch': 0.9} 90%|█████████ | 1109/1230 [21:47:38<2:21:17, 70.07s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2484 [2024-07-31 23:59:42,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.11 | bwd_microstep: 5532.89 | bwd_inner_microstep: 5109.19 | bwd_allreduce_microstep: 423.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-07-31 23:59:51,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.08 | bwd_microstep: 5328.43 | bwd_inner_microstep: 4915.45 | bwd_allreduce_microstep: 412.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3815 [2024-07-31 23:59:59,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.54 | bwd_microstep: 5041.91 | bwd_inner_microstep: 5022.52 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3726 [2024-08-01 00:00:08,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.46 | bwd_microstep: 5170.45 | bwd_inner_microstep: 5080.44 | bwd_allreduce_microstep: 89.94 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2192 [2024-08-01 00:00:17,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.60 | bwd_microstep: 5155.16 | bwd_inner_microstep: 4753.66 | bwd_allreduce_microstep: 401.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-08-01 00:00:26,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.13 | bwd_microstep: 5010.56 | bwd_inner_microstep: 4955.38 | bwd_allreduce_microstep: 55.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-08-01 00:00:34,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.43 | bwd_microstep: 4890.62 | bwd_inner_microstep: 4871.25 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-08-01 00:00:43,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-08-01 00:00:43,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.02 | bwd_microstep: 4922.45 | bwd_inner_microstep: 4895.91 | bwd_allreduce_microstep: 26.47 | step_microstep: 181.74 [2024-08-01 00:00:43,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29159.26 | bwd: 41052.46 | bwd_inner: 39603.74 | bwd_allreduce: 1448.23 | step: 182.33 90%|█████████ | 1110/1230 [21:48:49<2:20:25, 70.21s/it] {'loss': 1.1569, 'learning_rate': 4.951472523361401e-07, 'epoch': 0.9} 90%|█████████ | 1110/1230 [21:48:49<2:20:25, 70.21s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3856 [2024-08-01 00:00:52,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3860.47 | bwd_microstep: 5446.33 | bwd_inner_microstep: 5380.54 | bwd_allreduce_microstep: 65.71 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-08-01 00:01:01,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.37 | bwd_microstep: 5241.29 | bwd_inner_microstep: 4835.60 | bwd_allreduce_microstep: 405.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3790 [2024-08-01 00:01:10,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.78 | bwd_microstep: 5037.55 | bwd_inner_microstep: 5015.78 | bwd_allreduce_microstep: 21.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-08-01 00:01:19,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.39 | bwd_microstep: 5079.21 | bwd_inner_microstep: 5035.01 | bwd_allreduce_microstep: 44.13 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-08-01 00:01:27,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.84 | bwd_microstep: 5030.16 | bwd_inner_microstep: 4988.54 | bwd_allreduce_microstep: 41.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-08-01 00:01:35,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3028.99 | bwd_microstep: 4893.68 | bwd_inner_microstep: 4518.07 | bwd_allreduce_microstep: 375.54 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-08-01 00:01:44,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.09 | bwd_microstep: 5076.80 | bwd_inner_microstep: 5012.80 | bwd_allreduce_microstep: 63.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3672 [2024-08-01 00:01:53,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-08-01 00:01:53,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.83 | bwd_microstep: 4883.59 | bwd_inner_microstep: 4862.47 | bwd_allreduce_microstep: 21.05 | step_microstep: 181.64 [2024-08-01 00:01:53,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28602.65 | bwd: 40688.57 | bwd_inner: 39648.75 | bwd_allreduce: 1039.34 | step: 182.33 90%|█████████ | 1111/1230 [21:49:58<2:18:54, 70.04s/it] {'loss': 1.079, 'learning_rate': 4.869965506738416e-07, 'epoch': 0.9} 90%|█████████ | 1111/1230 [21:49:58<2:18:54, 70.04s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3939 [2024-08-01 00:02:02,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3860.53 | bwd_microstep: 5417.07 | bwd_inner_microstep: 5372.14 | bwd_allreduce_microstep: 44.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2307 [2024-08-01 00:02:11,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.95 | bwd_microstep: 5384.26 | bwd_inner_microstep: 4968.10 | bwd_allreduce_microstep: 416.10 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3610 [2024-08-01 00:02:20,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.78 | bwd_microstep: 5122.88 | bwd_inner_microstep: 5026.49 | bwd_allreduce_microstep: 96.33 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2184 [2024-08-01 00:02:28,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.67 | bwd_microstep: 5186.01 | bwd_inner_microstep: 4782.90 | bwd_allreduce_microstep: 403.04 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-08-01 00:02:37,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.07 | bwd_microstep: 5104.61 | bwd_inner_microstep: 5057.59 | bwd_allreduce_microstep: 46.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-08-01 00:02:46,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.79 | bwd_microstep: 5195.44 | bwd_inner_microstep: 5135.38 | bwd_allreduce_microstep: 59.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-08-01 00:02:55,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.28 | bwd_microstep: 4993.66 | bwd_inner_microstep: 4974.24 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-08-01 00:03:03,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-08-01 00:03:03,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3196.89 | bwd_microstep: 4895.35 | bwd_inner_microstep: 4516.00 | bwd_allreduce_microstep: 379.27 | step_microstep: 181.45 [2024-08-01 00:03:03,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28734.85 | bwd: 41299.26 | bwd_inner: 39832.78 | bwd_allreduce: 1465.99 | step: 182.03 90%|█████████ | 1112/1230 [21:51:09<2:17:55, 70.13s/it] {'loss': 1.1044, 'learning_rate': 4.789118174587048e-07, 'epoch': 0.9} 90%|█████████ | 1112/1230 [21:51:09<2:17:55, 70.13s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3936 [2024-08-01 00:03:12,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3789.14 | bwd_microstep: 5185.83 | bwd_inner_microstep: 5166.79 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2235 [2024-08-01 00:03:21,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.22 | bwd_microstep: 5253.32 | bwd_inner_microstep: 4846.10 | bwd_allreduce_microstep: 407.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-08-01 00:03:30,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3810.60 | bwd_microstep: 5217.98 | bwd_inner_microstep: 5177.33 | bwd_allreduce_microstep: 40.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-08-01 00:03:39,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.06 | bwd_microstep: 5061.40 | bwd_inner_microstep: 5035.88 | bwd_allreduce_microstep: 25.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-08-01 00:03:47,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.62 | bwd_microstep: 5129.38 | bwd_inner_microstep: 4729.57 | bwd_allreduce_microstep: 399.75 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-08-01 00:03:55,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3039.95 | bwd_microstep: 4971.14 | bwd_inner_microstep: 4589.30 | bwd_allreduce_microstep: 381.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-08-01 00:04:03,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.79 | bwd_microstep: 4858.70 | bwd_inner_microstep: 4814.08 | bwd_allreduce_microstep: 44.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-08-01 00:04:12,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-08-01 00:04:12,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.19 | bwd_microstep: 5118.32 | bwd_inner_microstep: 5044.89 | bwd_allreduce_microstep: 73.36 | step_microstep: 181.76 [2024-08-01 00:04:12,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28270.48 | bwd: 40796.06 | bwd_inner: 39403.88 | bwd_allreduce: 1391.70 | step: 182.34 90%|█████████ | 1113/1230 [21:52:18<2:16:19, 69.91s/it] {'loss': 1.1143, 'learning_rate': 4.7089310875475967e-07, 'epoch': 0.9} 90%|█████████ | 1113/1230 [21:52:18<2:16:19, 69.91s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3852 [2024-08-01 00:04:22,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3865.27 | bwd_microstep: 5448.41 | bwd_inner_microstep: 5379.73 | bwd_allreduce_microstep: 68.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3953 [2024-08-01 00:04:31,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3807.98 | bwd_microstep: 5176.88 | bwd_inner_microstep: 5157.62 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2048 [2024-08-01 00:04:39,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.68 | bwd_microstep: 5153.20 | bwd_inner_microstep: 4757.65 | bwd_allreduce_microstep: 395.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-08-01 00:04:48,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.05 | bwd_microstep: 5231.39 | bwd_inner_microstep: 5145.65 | bwd_allreduce_microstep: 85.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-08-01 00:04:57,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.75 | bwd_microstep: 5049.99 | bwd_inner_microstep: 4656.49 | bwd_allreduce_microstep: 393.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-08-01 00:05:06,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.82 | bwd_microstep: 5208.39 | bwd_inner_microstep: 4802.61 | bwd_allreduce_microstep: 405.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-08-01 00:05:14,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.40 | bwd_microstep: 5176.44 | bwd_inner_microstep: 4773.47 | bwd_allreduce_microstep: 402.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-08-01 00:05:23,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-08-01 00:05:23,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.75 | bwd_microstep: 5140.32 | bwd_inner_microstep: 5071.53 | bwd_allreduce_microstep: 68.72 | step_microstep: 181.24 [2024-08-01 00:05:23,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28970.61 | bwd: 41585.00 | bwd_inner: 39744.68 | bwd_allreduce: 1839.83 | step: 181.82 91%|█████████ | 1114/1230 [21:53:29<2:15:43, 70.20s/it] {'loss': 1.1438, 'learning_rate': 4.629404801681803e-07, 'epoch': 0.91} 91%|█████████ | 1114/1230 [21:53:29<2:15:43, 70.20s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2401 [2024-08-01 00:05:32,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.04 | bwd_microstep: 5500.81 | bwd_inner_microstep: 5077.94 | bwd_allreduce_microstep: 422.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4004 [2024-08-01 00:05:41,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.45 | bwd_microstep: 5140.76 | bwd_inner_microstep: 5114.13 | bwd_allreduce_microstep: 26.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-08-01 00:05:50,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.51 | bwd_microstep: 5136.39 | bwd_inner_microstep: 5060.45 | bwd_allreduce_microstep: 75.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-08-01 00:05:59,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.47 | bwd_microstep: 5042.65 | bwd_inner_microstep: 5018.07 | bwd_allreduce_microstep: 24.51 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2603 [2024-08-01 00:06:07,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.55 | bwd_microstep: 5143.39 | bwd_inner_microstep: 4742.24 | bwd_allreduce_microstep: 401.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-08-01 00:06:16,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.37 | bwd_microstep: 5022.94 | bwd_inner_microstep: 4998.45 | bwd_allreduce_microstep: 24.41 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-08-01 00:06:25,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.52 | bwd_microstep: 5175.44 | bwd_inner_microstep: 5100.22 | bwd_allreduce_microstep: 75.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-08-01 00:06:34,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-08-01 00:06:34,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.54 | bwd_microstep: 5016.75 | bwd_inner_microstep: 4963.88 | bwd_allreduce_microstep: 52.80 | step_microstep: 182.01 [2024-08-01 00:06:34,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29084.35 | bwd: 41179.11 | bwd_inner: 40075.32 | bwd_allreduce: 1103.31 | step: 182.60 91%|█████████ | 1115/1230 [21:54:40<2:14:47, 70.32s/it] {'loss': 1.0748, 'learning_rate': 4.550539868469106e-07, 'epoch': 0.91} 91%|█████████ | 1115/1230 [21:54:40<2:14:47, 70.32s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4019 [2024-08-01 00:06:43,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3846.73 | bwd_microstep: 5262.87 | bwd_inner_microstep: 5243.74 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3564 [2024-08-01 00:06:52,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.82 | bwd_microstep: 5256.05 | bwd_inner_microstep: 5113.22 | bwd_allreduce_microstep: 142.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3584 [2024-08-01 00:07:00,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3160.15 | bwd_microstep: 4757.76 | bwd_inner_microstep: 4718.53 | bwd_allreduce_microstep: 39.17 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3812 [2024-08-01 00:07:09,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.97 | bwd_microstep: 5063.12 | bwd_inner_microstep: 5043.87 | bwd_allreduce_microstep: 19.17 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3601 [2024-08-01 00:07:17,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.15 | bwd_microstep: 5136.75 | bwd_inner_microstep: 5076.08 | bwd_allreduce_microstep: 60.61 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-08-01 00:07:26,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.85 | bwd_microstep: 5083.29 | bwd_inner_microstep: 5039.81 | bwd_allreduce_microstep: 43.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-08-01 00:07:35,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.62 | bwd_microstep: 4987.51 | bwd_inner_microstep: 4950.24 | bwd_allreduce_microstep: 37.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-08-01 00:07:44,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-08-01 00:07:44,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.36 | bwd_microstep: 4931.46 | bwd_inner_microstep: 4906.45 | bwd_allreduce_microstep: 24.95 | step_microstep: 181.65 [2024-08-01 00:07:44,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28997.56 | bwd: 40478.80 | bwd_inner: 40091.86 | bwd_allreduce: 386.44 | step: 182.33 91%|█████████ | 1116/1230 [21:55:50<2:13:19, 70.17s/it] {'loss': 1.1078, 'learning_rate': 4.4723368348027375e-07, 'epoch': 0.91} 91%|█████████ | 1116/1230 [21:55:50<2:13:19, 70.17s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2446 [2024-08-01 00:07:53,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.58 | bwd_microstep: 5286.58 | bwd_inner_microstep: 4878.34 | bwd_allreduce_microstep: 408.17 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3823 [2024-08-01 00:08:01,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.36 | bwd_microstep: 5054.13 | bwd_inner_microstep: 5034.88 | bwd_allreduce_microstep: 19.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3767 [2024-08-01 00:08:10,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.37 | bwd_microstep: 5188.32 | bwd_inner_microstep: 5110.63 | bwd_allreduce_microstep: 77.61 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2078 [2024-08-01 00:08:19,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.66 | bwd_microstep: 5215.38 | bwd_inner_microstep: 4810.82 | bwd_allreduce_microstep: 404.49 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2182 [2024-08-01 00:08:28,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.45 | bwd_microstep: 5184.17 | bwd_inner_microstep: 4781.02 | bwd_allreduce_microstep: 403.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-08-01 00:08:36,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3038.07 | bwd_microstep: 4911.95 | bwd_inner_microstep: 4534.97 | bwd_allreduce_microstep: 376.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-08-01 00:08:44,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.63 | bwd_microstep: 5040.20 | bwd_inner_microstep: 4998.15 | bwd_allreduce_microstep: 41.99 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2101 [2024-08-01 00:08:52,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-08-01 00:08:52,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3032.87 | bwd_microstep: 4897.50 | bwd_inner_microstep: 4519.80 | bwd_allreduce_microstep: 377.63 | step_microstep: 181.54 [2024-08-01 00:08:52,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27713.89 | bwd: 40778.22 | bwd_inner: 38668.56 | bwd_allreduce: 2109.17 | step: 182.13 91%|█████████ | 1117/1230 [21:56:58<2:11:23, 69.76s/it] {'loss': 1.1889, 'learning_rate': 4.394796242985955e-07, 'epoch': 0.91} 91%|█████████ | 1117/1230 [21:56:58<2:11:23, 69.76s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2373 [2024-08-01 00:09:02,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.93 | bwd_microstep: 5625.03 | bwd_inner_microstep: 5194.17 | bwd_allreduce_microstep: 430.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2284 [2024-08-01 00:09:10,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3113.28 | bwd_microstep: 5228.13 | bwd_inner_microstep: 4826.65 | bwd_allreduce_microstep: 401.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-08-01 00:09:19,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.95 | bwd_microstep: 5371.23 | bwd_inner_microstep: 5269.27 | bwd_allreduce_microstep: 101.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-08-01 00:09:28,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.20 | bwd_microstep: 5027.35 | bwd_inner_microstep: 5007.95 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-08-01 00:09:37,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.31 | bwd_microstep: 5179.30 | bwd_inner_microstep: 5095.82 | bwd_allreduce_microstep: 83.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-08-01 00:09:46,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.21 | bwd_microstep: 5138.50 | bwd_inner_microstep: 5070.66 | bwd_allreduce_microstep: 67.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-08-01 00:09:54,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.55 | bwd_microstep: 5136.85 | bwd_inner_microstep: 5069.76 | bwd_allreduce_microstep: 67.02 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2139 [2024-08-01 00:10:03,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-08-01 00:10:03,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.22 | bwd_microstep: 5117.26 | bwd_inner_microstep: 4721.78 | bwd_allreduce_microstep: 395.42 | step_microstep: 182.15 [2024-08-01 00:10:03,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28543.55 | bwd: 41823.63 | bwd_inner: 40256.00 | bwd_allreduce: 1567.14 | step: 182.72 91%|█████████ | 1118/1230 [21:58:09<2:10:44, 70.04s/it] {'loss': 1.1545, 'learning_rate': 4.317918630728224e-07, 'epoch': 0.91} 91%|█████████ | 1118/1230 [21:58:09<2:10:44, 70.04s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-08-01 00:10:12,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3893.50 | bwd_microstep: 5360.19 | bwd_inner_microstep: 5341.06 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3743 [2024-08-01 00:10:21,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.07 | bwd_microstep: 5073.92 | bwd_inner_microstep: 5037.55 | bwd_allreduce_microstep: 36.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-08-01 00:10:30,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.52 | bwd_microstep: 5173.49 | bwd_inner_microstep: 5087.66 | bwd_allreduce_microstep: 85.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-08-01 00:10:39,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.68 | bwd_microstep: 5248.77 | bwd_inner_microstep: 5161.79 | bwd_allreduce_microstep: 86.91 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-08-01 00:10:48,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.16 | bwd_microstep: 5233.40 | bwd_inner_microstep: 4831.27 | bwd_allreduce_microstep: 402.06 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2092 [2024-08-01 00:10:56,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.12 | bwd_microstep: 5212.45 | bwd_inner_microstep: 4806.73 | bwd_allreduce_microstep: 405.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-08-01 00:11:05,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.38 | bwd_microstep: 5061.53 | bwd_inner_microstep: 4997.16 | bwd_allreduce_microstep: 64.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-08-01 00:11:14,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-08-01 00:11:14,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.89 | bwd_microstep: 4950.27 | bwd_inner_microstep: 4907.38 | bwd_allreduce_microstep: 42.81 | step_microstep: 181.87 [2024-08-01 00:11:14,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29019.22 | bwd: 41313.99 | bwd_inner: 40170.55 | bwd_allreduce: 1142.95 | step: 182.45 91%|█████████ | 1119/1230 [21:59:20<2:09:55, 70.23s/it] {'loss': 1.1793, 'learning_rate': 4.241704531141644e-07, 'epoch': 0.91} 91%|█████████ | 1119/1230 [21:59:20<2:09:55, 70.23s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4020 [2024-08-01 00:11:23,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.72 | bwd_microstep: 5152.83 | bwd_inner_microstep: 5129.82 | bwd_allreduce_microstep: 22.94 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3584 [2024-08-01 00:11:32,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.36 | bwd_microstep: 5176.25 | bwd_inner_microstep: 5072.69 | bwd_allreduce_microstep: 103.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3830 [2024-08-01 00:11:40,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.81 | bwd_microstep: 5051.75 | bwd_inner_microstep: 5032.40 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-08-01 00:11:49,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.77 | bwd_microstep: 5203.05 | bwd_inner_microstep: 5147.10 | bwd_allreduce_microstep: 55.88 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-08-01 00:11:58,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.50 | bwd_microstep: 4976.22 | bwd_inner_microstep: 4956.90 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-08-01 00:12:07,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.76 | bwd_microstep: 5017.02 | bwd_inner_microstep: 4977.71 | bwd_allreduce_microstep: 39.23 | step_microstep: 0.18 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2121 [2024-08-01 00:12:15,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.74 | bwd_microstep: 5172.25 | bwd_inner_microstep: 4772.69 | bwd_allreduce_microstep: 399.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2143 [2024-08-01 00:12:24,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-08-01 00:12:24,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.25 | bwd_microstep: 5049.80 | bwd_inner_microstep: 4658.80 | bwd_allreduce_microstep: 390.93 | step_microstep: 181.68 [2024-08-01 00:12:24,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29003.81 | bwd: 40799.15 | bwd_inner: 39748.05 | bwd_allreduce: 1050.59 | step: 182.37 91%|█████████ | 1120/1230 [22:00:30<2:08:42, 70.20s/it] {'loss': 1.14, 'learning_rate': 4.166154472737061e-07, 'epoch': 0.91} 91%|█████████ | 1120/1230 [22:00:30<2:08:42, 70.20s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3952 [2024-08-01 00:12:33,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.10 | bwd_microstep: 5302.53 | bwd_inner_microstep: 5248.96 | bwd_allreduce_microstep: 53.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3851 [2024-08-01 00:12:42,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.77 | bwd_microstep: 5290.73 | bwd_inner_microstep: 5231.45 | bwd_allreduce_microstep: 59.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-08-01 00:12:50,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.39 | bwd_microstep: 4891.92 | bwd_inner_microstep: 4872.62 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-08-01 00:12:58,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.43 | bwd_microstep: 4781.22 | bwd_inner_microstep: 4745.05 | bwd_allreduce_microstep: 36.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3643 [2024-08-01 00:13:07,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.24 | bwd_microstep: 5219.96 | bwd_inner_microstep: 5122.61 | bwd_allreduce_microstep: 97.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-08-01 00:13:16,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.37 | bwd_microstep: 5178.59 | bwd_inner_microstep: 5124.58 | bwd_allreduce_microstep: 53.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-08-01 00:13:25,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.33 | bwd_microstep: 4893.76 | bwd_inner_microstep: 4874.44 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-08-01 00:13:34,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-08-01 00:13:34,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.86 | bwd_microstep: 5022.59 | bwd_inner_microstep: 4972.48 | bwd_allreduce_microstep: 50.05 | step_microstep: 181.51 [2024-08-01 00:13:34,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28629.40 | bwd: 40581.28 | bwd_inner: 40192.15 | bwd_allreduce: 388.65 | step: 182.09 91%|█████████ | 1121/1230 [22:01:39<2:07:10, 70.00s/it] {'loss': 1.1623, 'learning_rate': 4.0912689794205483e-07, 'epoch': 0.91} 91%|█████████ | 1121/1230 [22:01:39<2:07:10, 70.00s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3817 [2024-08-01 00:13:43,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.60 | bwd_microstep: 5576.87 | bwd_inner_microstep: 5476.93 | bwd_allreduce_microstep: 99.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3563 [2024-08-01 00:13:52,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.80 | bwd_microstep: 5188.81 | bwd_inner_microstep: 5095.41 | bwd_allreduce_microstep: 93.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2234 [2024-08-01 00:14:00,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.42 | bwd_microstep: 5212.98 | bwd_inner_microstep: 4806.11 | bwd_allreduce_microstep: 406.80 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 11.0, dynamic token length: 3594 [2024-08-01 00:14:09,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3336.28 | bwd_microstep: 4783.32 | bwd_inner_microstep: 4756.12 | bwd_allreduce_microstep: 27.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-08-01 00:14:17,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.93 | bwd_microstep: 5124.53 | bwd_inner_microstep: 4728.03 | bwd_allreduce_microstep: 396.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-08-01 00:14:26,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.25 | bwd_microstep: 4992.48 | bwd_inner_microstep: 4956.99 | bwd_allreduce_microstep: 35.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3675 [2024-08-01 00:14:35,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.63 | bwd_microstep: 4985.34 | bwd_inner_microstep: 4920.49 | bwd_allreduce_microstep: 64.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-08-01 00:14:43,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-08-01 00:14:43,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.21 | bwd_microstep: 4934.94 | bwd_inner_microstep: 4907.60 | bwd_allreduce_microstep: 27.27 | step_microstep: 181.26 [2024-08-01 00:14:43,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28745.01 | bwd: 40799.26 | bwd_inner: 39647.61 | bwd_allreduce: 1151.17 | step: 181.83 91%|█████████ | 1122/1230 [22:02:49<2:05:56, 69.97s/it] {'loss': 1.1723, 'learning_rate': 4.0170485704896453e-07, 'epoch': 0.91} 91%|█████████ | 1122/1230 [22:02:49<2:05:56, 69.97s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3948 [2024-08-01 00:14:52,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3805.27 | bwd_microstep: 5179.77 | bwd_inner_microstep: 5160.70 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3846 [2024-08-01 00:15:01,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3789.56 | bwd_microstep: 5102.01 | bwd_inner_microstep: 5082.65 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-08-01 00:15:10,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.21 | bwd_microstep: 5044.23 | bwd_inner_microstep: 5015.04 | bwd_allreduce_microstep: 29.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-08-01 00:15:19,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.64 | bwd_microstep: 5182.89 | bwd_inner_microstep: 5107.23 | bwd_allreduce_microstep: 75.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3776 [2024-08-01 00:15:28,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.74 | bwd_microstep: 5137.92 | bwd_inner_microstep: 5064.21 | bwd_allreduce_microstep: 73.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-08-01 00:15:36,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.24 | bwd_microstep: 5027.06 | bwd_inner_microstep: 4974.50 | bwd_allreduce_microstep: 52.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-08-01 00:15:45,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.89 | bwd_microstep: 5045.18 | bwd_inner_microstep: 4983.98 | bwd_allreduce_microstep: 61.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-08-01 00:15:54,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-08-01 00:15:54,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.39 | bwd_microstep: 4881.47 | bwd_inner_microstep: 4862.13 | bwd_allreduce_microstep: 19.26 | step_microstep: 181.34 [2024-08-01 00:15:54,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29402.86 | bwd: 40600.51 | bwd_inner: 40250.39 | bwd_allreduce: 349.64 | step: 181.90 91%|█████████▏| 1123/1230 [22:04:00<2:04:58, 70.08s/it] {'loss': 1.0986, 'learning_rate': 3.943493760629924e-07, 'epoch': 0.91} 91%|█████████▏| 1123/1230 [22:04:00<2:04:58, 70.08s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3494 [2024-08-01 00:16:03,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3671.22 | bwd_microstep: 5437.54 | bwd_inner_microstep: 5258.27 | bwd_allreduce_microstep: 179.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3842 [2024-08-01 00:16:12,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3813.66 | bwd_microstep: 5342.77 | bwd_inner_microstep: 5288.11 | bwd_allreduce_microstep: 54.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-08-01 00:16:21,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.45 | bwd_microstep: 5122.13 | bwd_inner_microstep: 5075.80 | bwd_allreduce_microstep: 46.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-08-01 00:16:30,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.37 | bwd_microstep: 5002.08 | bwd_inner_microstep: 4982.72 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3866 [2024-08-01 00:16:38,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.59 | bwd_microstep: 5136.36 | bwd_inner_microstep: 5077.26 | bwd_allreduce_microstep: 59.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-08-01 00:16:47,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.80 | bwd_microstep: 5001.11 | bwd_inner_microstep: 4966.81 | bwd_allreduce_microstep: 34.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3662 [2024-08-01 00:16:56,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.87 | bwd_microstep: 5072.20 | bwd_inner_microstep: 4989.00 | bwd_allreduce_microstep: 83.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-08-01 00:17:04,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-08-01 00:17:04,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.74 | bwd_microstep: 4916.68 | bwd_inner_microstep: 4892.48 | bwd_allreduce_microstep: 24.13 | step_microstep: 181.51 [2024-08-01 00:17:04,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29322.58 | bwd: 41030.86 | bwd_inner: 40530.40 | bwd_allreduce: 499.97 | step: 182.09 91%|█████████▏| 1124/1230 [22:05:10<2:04:07, 70.26s/it] {'loss': 1.0803, 'learning_rate': 3.8706050599112363e-07, 'epoch': 0.91} 91%|█████████▏| 1124/1230 [22:05:10<2:04:07, 70.26s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3853 [2024-08-01 00:17:14,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.17 | bwd_microstep: 5570.76 | bwd_inner_microstep: 5474.09 | bwd_allreduce_microstep: 96.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3808 [2024-08-01 00:17:23,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.94 | bwd_microstep: 5154.00 | bwd_inner_microstep: 5081.56 | bwd_allreduce_microstep: 72.37 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-08-01 00:17:31,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.48 | bwd_microstep: 4999.30 | bwd_inner_microstep: 4979.99 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3758 [2024-08-01 00:17:40,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.68 | bwd_microstep: 5158.18 | bwd_inner_microstep: 5118.86 | bwd_allreduce_microstep: 39.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-08-01 00:17:49,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.83 | bwd_microstep: 5187.52 | bwd_inner_microstep: 5110.29 | bwd_allreduce_microstep: 77.16 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2109 [2024-08-01 00:17:57,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.17 | bwd_microstep: 5062.71 | bwd_inner_microstep: 4670.97 | bwd_allreduce_microstep: 391.67 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-08-01 00:18:06,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.76 | bwd_microstep: 5057.30 | bwd_inner_microstep: 4996.35 | bwd_allreduce_microstep: 60.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-08-01 00:18:15,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-08-01 00:18:15,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.30 | bwd_microstep: 5041.44 | bwd_inner_microstep: 5000.45 | bwd_allreduce_microstep: 40.93 | step_microstep: 182.92 [2024-08-01 00:18:15,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28969.25 | bwd: 41231.18 | bwd_inner: 40432.49 | bwd_allreduce: 798.20 | step: 183.60 91%|█████████▏| 1125/1230 [22:06:21<2:03:06, 70.35s/it] {'loss': 1.1279, 'learning_rate': 3.798382973784298e-07, 'epoch': 0.91} 91%|█████████▏| 1125/1230 [22:06:21<2:03:06, 70.35s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-08-01 00:18:24,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3826.09 | bwd_microstep: 5274.65 | bwd_inner_microstep: 5238.83 | bwd_allreduce_microstep: 35.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3904 [2024-08-01 00:18:32,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3273.20 | bwd_microstep: 4939.03 | bwd_inner_microstep: 4919.70 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3833 [2024-08-01 00:18:41,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.76 | bwd_microstep: 5054.97 | bwd_inner_microstep: 5035.55 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2198 [2024-08-01 00:18:50,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.59 | bwd_microstep: 5255.45 | bwd_inner_microstep: 4844.93 | bwd_allreduce_microstep: 410.46 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2184 [2024-08-01 00:18:59,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.43 | bwd_microstep: 5311.65 | bwd_inner_microstep: 4902.56 | bwd_allreduce_microstep: 409.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-08-01 00:19:08,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.10 | bwd_microstep: 5042.60 | bwd_inner_microstep: 4974.60 | bwd_allreduce_microstep: 67.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-08-01 00:19:16,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.78 | bwd_microstep: 5187.99 | bwd_inner_microstep: 5109.99 | bwd_allreduce_microstep: 77.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-08-01 00:19:25,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-08-01 00:19:25,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.65 | bwd_microstep: 5097.38 | bwd_inner_microstep: 4704.59 | bwd_allreduce_microstep: 392.72 | step_microstep: 181.63 [2024-08-01 00:19:25,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28744.50 | bwd: 41163.69 | bwd_inner: 39730.67 | bwd_allreduce: 1432.53 | step: 182.20 92%|█████████▏| 1126/1230 [22:07:31<2:01:52, 70.31s/it] {'loss': 1.1194, 'learning_rate': 3.7268280030771655e-07, 'epoch': 0.92} 92%|█████████▏| 1126/1230 [22:07:31<2:01:52, 70.31s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3647 [2024-08-01 00:19:34,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3707.37 | bwd_microstep: 5493.32 | bwd_inner_microstep: 5330.76 | bwd_allreduce_microstep: 162.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-08-01 00:19:43,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.05 | bwd_microstep: 4922.51 | bwd_inner_microstep: 4866.90 | bwd_allreduce_microstep: 55.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-08-01 00:19:51,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3038.22 | bwd_microstep: 5071.24 | bwd_inner_microstep: 4682.64 | bwd_allreduce_microstep: 388.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-08-01 00:19:59,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.96 | bwd_microstep: 4998.05 | bwd_inner_microstep: 4978.75 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-08-01 00:20:08,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.79 | bwd_microstep: 4796.10 | bwd_inner_microstep: 4776.68 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-08-01 00:20:16,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.12 | bwd_microstep: 4772.98 | bwd_inner_microstep: 4739.00 | bwd_allreduce_microstep: 33.92 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-08-01 00:20:24,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.45 | bwd_microstep: 5048.23 | bwd_inner_microstep: 5005.00 | bwd_allreduce_microstep: 43.17 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-08-01 00:20:33,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-08-01 00:20:33,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.15 | bwd_microstep: 5000.43 | bwd_inner_microstep: 4981.04 | bwd_allreduce_microstep: 19.32 | step_microstep: 181.96 [2024-08-01 00:20:33,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27669.01 | bwd: 40102.85 | bwd_inner: 39360.70 | bwd_allreduce: 741.66 | step: 182.54 92%|█████████▏| 1127/1230 [22:08:39<1:59:33, 69.65s/it] {'loss': 1.1201, 'learning_rate': 3.655940643991729e-07, 'epoch': 0.92} 92%|█████████▏| 1127/1230 [22:08:39<1:59:33, 69.65s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-08-01 00:20:43,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3875.26 | bwd_microstep: 5338.92 | bwd_inner_microstep: 5319.87 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-08-01 00:20:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.21 | bwd_microstep: 5208.29 | bwd_inner_microstep: 5155.22 | bwd_allreduce_microstep: 53.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3862 [2024-08-01 00:21:00,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3399.96 | bwd_microstep: 4921.05 | bwd_inner_microstep: 4901.63 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2866 [2024-08-01 00:21:08,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.14 | bwd_microstep: 5141.93 | bwd_inner_microstep: 4740.90 | bwd_allreduce_microstep: 400.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-08-01 00:21:17,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.85 | bwd_microstep: 5016.97 | bwd_inner_microstep: 4997.60 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-08-01 00:21:26,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.60 | bwd_microstep: 5089.84 | bwd_inner_microstep: 5044.18 | bwd_allreduce_microstep: 45.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-08-01 00:21:35,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.91 | bwd_microstep: 4925.80 | bwd_inner_microstep: 4901.59 | bwd_allreduce_microstep: 24.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-08-01 00:21:43,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-08-01 00:21:43,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.28 | bwd_microstep: 5053.02 | bwd_inner_microstep: 4993.77 | bwd_allreduce_microstep: 59.18 | step_microstep: 181.49 [2024-08-01 00:21:43,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29034.10 | bwd: 40695.81 | bwd_inner: 40054.70 | bwd_allreduce: 640.61 | step: 182.06 92%|█████████▏| 1128/1230 [22:09:49<1:58:36, 69.77s/it] {'loss': 1.1357, 'learning_rate': 3.585721388100283e-07, 'epoch': 0.92} 92%|█████████▏| 1128/1230 [22:09:49<1:58:36, 69.77s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2347 [2024-08-01 00:21:53,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.90 | bwd_microstep: 5619.57 | bwd_inner_microstep: 5189.33 | bwd_allreduce_microstep: 430.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-08-01 00:22:02,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.42 | bwd_microstep: 5235.18 | bwd_inner_microstep: 5149.22 | bwd_allreduce_microstep: 85.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-08-01 00:22:10,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.82 | bwd_microstep: 5156.53 | bwd_inner_microstep: 5082.92 | bwd_allreduce_microstep: 73.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3777 [2024-08-01 00:22:19,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.52 | bwd_microstep: 5031.95 | bwd_inner_microstep: 5012.60 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-08-01 00:22:28,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.50 | bwd_microstep: 5165.57 | bwd_inner_microstep: 5109.55 | bwd_allreduce_microstep: 55.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-08-01 00:22:36,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3188.09 | bwd_microstep: 4705.71 | bwd_inner_microstep: 4684.53 | bwd_allreduce_microstep: 21.12 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2118 [2024-08-01 00:22:44,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.47 | bwd_microstep: 5035.36 | bwd_inner_microstep: 4646.27 | bwd_allreduce_microstep: 389.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2140 [2024-08-01 00:22:53,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-08-01 00:22:53,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.92 | bwd_microstep: 5060.42 | bwd_inner_microstep: 4668.33 | bwd_allreduce_microstep: 392.02 | step_microstep: 184.26 [2024-08-01 00:22:53,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28407.56 | bwd: 41010.27 | bwd_inner: 39542.70 | bwd_allreduce: 1467.08 | step: 184.94 92%|█████████▏| 1129/1230 [22:10:59<1:57:26, 69.77s/it] {'loss': 1.1804, 'learning_rate': 3.516170722342127e-07, 'epoch': 0.92} 92%|█████████▏| 1129/1230 [22:10:59<1:57:26, 69.77s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3811 [2024-08-01 00:23:02,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.17 | bwd_microstep: 5504.79 | bwd_inner_microstep: 5415.96 | bwd_allreduce_microstep: 88.76 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2033 [2024-08-01 00:23:11,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.37 | bwd_microstep: 5158.62 | bwd_inner_microstep: 4758.11 | bwd_allreduce_microstep: 400.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-08-01 00:23:19,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.55 | bwd_microstep: 4897.14 | bwd_inner_microstep: 4845.78 | bwd_allreduce_microstep: 51.29 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-08-01 00:23:27,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.24 | bwd_microstep: 4850.35 | bwd_inner_microstep: 4804.21 | bwd_allreduce_microstep: 46.07 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3624 [2024-08-01 00:23:36,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.72 | bwd_microstep: 5084.92 | bwd_inner_microstep: 5001.14 | bwd_allreduce_microstep: 83.72 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3709 [2024-08-01 00:23:45,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.29 | bwd_microstep: 5028.63 | bwd_inner_microstep: 4962.38 | bwd_allreduce_microstep: 66.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-08-01 00:23:53,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.53 | bwd_microstep: 5031.72 | bwd_inner_microstep: 4978.69 | bwd_allreduce_microstep: 52.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-08-01 00:24:02,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-08-01 00:24:02,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.23 | bwd_microstep: 4994.93 | bwd_inner_microstep: 4941.60 | bwd_allreduce_microstep: 53.26 | step_microstep: 209.30 [2024-08-01 00:24:02,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27855.00 | bwd: 40551.08 | bwd_inner: 39707.80 | bwd_allreduce: 842.80 | step: 209.89 92%|█████████▏| 1130/1230 [22:12:08<1:55:46, 69.47s/it] {'loss': 1.0892, 'learning_rate': 3.4472891290201927e-07, 'epoch': 0.92} 92%|█████████▏| 1130/1230 [22:12:08<1:55:46, 69.47s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2392 [2024-08-01 00:24:11,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.35 | bwd_microstep: 5525.19 | bwd_inner_microstep: 5099.93 | bwd_allreduce_microstep: 425.19 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2304 [2024-08-01 00:24:19,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3127.28 | bwd_microstep: 5196.30 | bwd_inner_microstep: 4798.88 | bwd_allreduce_microstep: 397.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-08-01 00:24:28,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.62 | bwd_microstep: 5299.31 | bwd_inner_microstep: 4887.77 | bwd_allreduce_microstep: 411.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-08-01 00:24:36,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3052.47 | bwd_microstep: 5045.28 | bwd_inner_microstep: 4656.28 | bwd_allreduce_microstep: 388.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3674 [2024-08-01 00:24:45,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.74 | bwd_microstep: 4908.52 | bwd_inner_microstep: 4882.84 | bwd_allreduce_microstep: 25.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-08-01 00:24:54,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.16 | bwd_microstep: 5023.21 | bwd_inner_microstep: 4979.93 | bwd_allreduce_microstep: 43.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-08-01 00:25:03,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.46 | bwd_microstep: 5181.00 | bwd_inner_microstep: 4777.64 | bwd_allreduce_microstep: 403.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-08-01 00:25:11,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-08-01 00:25:11,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.95 | bwd_microstep: 5052.32 | bwd_inner_microstep: 4661.10 | bwd_allreduce_microstep: 391.15 | step_microstep: 182.48 [2024-08-01 00:25:11,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27915.95 | bwd: 41231.10 | bwd_inner: 38744.31 | bwd_allreduce: 2486.30 | step: 183.06 92%|█████████▏| 1131/1230 [22:13:17<1:54:37, 69.47s/it] {'loss': 1.1181, 'learning_rate': 3.3790770857976884e-07, 'epoch': 0.92} 92%|█████████▏| 1131/1230 [22:13:17<1:54:37, 69.47s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3910 [2024-08-01 00:25:21,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3845.50 | bwd_microstep: 5275.15 | bwd_inner_microstep: 5240.24 | bwd_allreduce_microstep: 34.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-08-01 00:25:29,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.29 | bwd_microstep: 5046.25 | bwd_inner_microstep: 5022.53 | bwd_allreduce_microstep: 23.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3575 [2024-08-01 00:25:38,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.29 | bwd_microstep: 5107.26 | bwd_inner_microstep: 5034.16 | bwd_allreduce_microstep: 73.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-08-01 00:25:47,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.13 | bwd_microstep: 5071.16 | bwd_inner_microstep: 5012.03 | bwd_allreduce_microstep: 59.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-08-01 00:25:55,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.06 | bwd_microstep: 5222.23 | bwd_inner_microstep: 4813.49 | bwd_allreduce_microstep: 408.67 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3797 [2024-08-01 00:26:04,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.31 | bwd_microstep: 5047.59 | bwd_inner_microstep: 4997.55 | bwd_allreduce_microstep: 49.97 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2141 [2024-08-01 00:26:13,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.06 | bwd_microstep: 5174.59 | bwd_inner_microstep: 4771.15 | bwd_allreduce_microstep: 403.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3672 [2024-08-01 00:26:22,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-08-01 00:26:22,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.51 | bwd_microstep: 4916.99 | bwd_inner_microstep: 4889.57 | bwd_allreduce_microstep: 27.35 | step_microstep: 182.03 [2024-08-01 00:26:22,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29055.04 | bwd: 40861.21 | bwd_inner: 39780.65 | bwd_allreduce: 1080.07 | step: 182.60 92%|█████████▏| 1132/1230 [22:14:27<1:53:51, 69.70s/it] {'loss': 1.1336, 'learning_rate': 3.3115350656948043e-07, 'epoch': 0.92} 92%|█████████▏| 1132/1230 [22:14:27<1:53:51, 69.70s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-08-01 00:26:31,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3815.10 | bwd_microstep: 5435.81 | bwd_inner_microstep: 5361.56 | bwd_allreduce_microstep: 74.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3858 [2024-08-01 00:26:40,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.38 | bwd_microstep: 5173.25 | bwd_inner_microstep: 5125.63 | bwd_allreduce_microstep: 47.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2233 [2024-08-01 00:26:48,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3062.26 | bwd_microstep: 5094.45 | bwd_inner_microstep: 4700.90 | bwd_allreduce_microstep: 393.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3785 [2024-08-01 00:26:57,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.82 | bwd_microstep: 5024.83 | bwd_inner_microstep: 5005.56 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-08-01 00:27:05,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.25 | bwd_microstep: 5216.18 | bwd_inner_microstep: 4811.35 | bwd_allreduce_microstep: 404.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2222 [2024-08-01 00:27:14,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3065.71 | bwd_microstep: 5035.39 | bwd_inner_microstep: 4646.54 | bwd_allreduce_microstep: 388.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-08-01 00:27:22,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.51 | bwd_microstep: 5060.84 | bwd_inner_microstep: 5001.07 | bwd_allreduce_microstep: 59.69 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-08-01 00:27:31,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-08-01 00:27:31,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.86 | bwd_microstep: 5195.29 | bwd_inner_microstep: 4788.75 | bwd_allreduce_microstep: 406.47 | step_microstep: 181.73 [2024-08-01 00:27:31,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27949.78 | bwd: 41236.02 | bwd_inner: 39441.31 | bwd_allreduce: 1794.22 | step: 182.32 92%|█████████▏| 1133/1230 [22:15:37<1:52:35, 69.65s/it] {'loss': 1.1421, 'learning_rate': 3.24466353708538e-07, 'epoch': 0.92} 92%|█████████▏| 1133/1230 [22:15:37<1:52:35, 69.65s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3581 [2024-08-01 00:27:40,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.91 | bwd_microstep: 5273.02 | bwd_inner_microstep: 5174.38 | bwd_allreduce_microstep: 98.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3970 [2024-08-01 00:27:49,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.94 | bwd_microstep: 5070.99 | bwd_inner_microstep: 5049.97 | bwd_allreduce_microstep: 20.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-08-01 00:27:58,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.80 | bwd_microstep: 5124.02 | bwd_inner_microstep: 5046.03 | bwd_allreduce_microstep: 77.89 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3599 [2024-08-01 00:28:06,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.88 | bwd_microstep: 5193.38 | bwd_inner_microstep: 5104.35 | bwd_allreduce_microstep: 88.96 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-08-01 00:28:15,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.02 | bwd_microstep: 5126.74 | bwd_inner_microstep: 4727.48 | bwd_allreduce_microstep: 399.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-08-01 00:28:24,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.22 | bwd_microstep: 5153.62 | bwd_inner_microstep: 4754.33 | bwd_allreduce_microstep: 399.22 | step_microstep: 0.19 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2109 [2024-08-01 00:28:33,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.48 | bwd_microstep: 5225.64 | bwd_inner_microstep: 4818.44 | bwd_allreduce_microstep: 407.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-08-01 00:28:41,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-08-01 00:28:41,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3692.08 | bwd_microstep: 4883.20 | bwd_inner_microstep: 4863.78 | bwd_allreduce_microstep: 19.34 | step_microstep: 182.25 [2024-08-01 00:28:41,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28832.22 | bwd: 41050.56 | bwd_inner: 39538.70 | bwd_allreduce: 1511.37 | step: 182.94 92%|█████████▏| 1134/1230 [22:16:47<1:51:42, 69.82s/it] {'loss': 1.0703, 'learning_rate': 3.1784629636937404e-07, 'epoch': 0.92} 92%|█████████▏| 1134/1230 [22:16:47<1:51:42, 69.82s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4056 [2024-08-01 00:28:51,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3864.30 | bwd_microstep: 5332.67 | bwd_inner_microstep: 5313.56 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4000 [2024-08-01 00:29:00,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.04 | bwd_microstep: 5349.63 | bwd_inner_microstep: 5308.23 | bwd_allreduce_microstep: 41.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-08-01 00:29:08,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.34 | bwd_microstep: 5167.89 | bwd_inner_microstep: 5112.07 | bwd_allreduce_microstep: 55.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-08-01 00:29:17,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.97 | bwd_microstep: 4997.14 | bwd_inner_microstep: 4977.79 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-08-01 00:29:26,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.74 | bwd_microstep: 5034.64 | bwd_inner_microstep: 5006.63 | bwd_allreduce_microstep: 27.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2198 [2024-08-01 00:29:35,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.89 | bwd_microstep: 5065.20 | bwd_inner_microstep: 4672.70 | bwd_allreduce_microstep: 392.42 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-08-01 00:29:43,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.39 | bwd_microstep: 4891.29 | bwd_inner_microstep: 4870.26 | bwd_allreduce_microstep: 20.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2143 [2024-08-01 00:29:52,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-08-01 00:29:52,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.12 | bwd_microstep: 5068.10 | bwd_inner_microstep: 4674.93 | bwd_allreduce_microstep: 393.10 | step_microstep: 182.21 [2024-08-01 00:29:52,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29300.70 | bwd: 40906.53 | bwd_inner: 39936.12 | bwd_allreduce: 969.92 | step: 182.81 92%|█████████▏| 1135/1230 [22:17:58<1:50:53, 70.04s/it] {'loss': 1.1196, 'learning_rate': 3.1129338045914004e-07, 'epoch': 0.92} 92%|█████████▏| 1135/1230 [22:17:58<1:50:53, 70.04s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-08-01 00:30:01,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.45 | bwd_microstep: 5192.75 | bwd_inner_microstep: 5144.32 | bwd_allreduce_microstep: 48.36 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2033 [2024-08-01 00:30:09,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3309.18 | bwd_microstep: 5143.54 | bwd_inner_microstep: 4744.79 | bwd_allreduce_microstep: 398.68 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2064 [2024-08-01 00:30:18,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.02 | bwd_microstep: 5173.82 | bwd_inner_microstep: 4771.57 | bwd_allreduce_microstep: 402.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-08-01 00:30:27,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.65 | bwd_microstep: 5021.69 | bwd_inner_microstep: 4998.20 | bwd_allreduce_microstep: 23.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3734 [2024-08-01 00:30:35,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.43 | bwd_microstep: 4998.33 | bwd_inner_microstep: 4964.63 | bwd_allreduce_microstep: 33.63 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3703 [2024-08-01 00:30:44,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.82 | bwd_microstep: 5115.07 | bwd_inner_microstep: 5045.62 | bwd_allreduce_microstep: 69.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3795 [2024-08-01 00:30:53,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.12 | bwd_microstep: 4900.35 | bwd_inner_microstep: 4878.68 | bwd_allreduce_microstep: 21.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-08-01 00:31:01,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-08-01 00:31:01,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.61 | bwd_microstep: 4998.07 | bwd_inner_microstep: 4951.24 | bwd_allreduce_microstep: 46.75 | step_microstep: 182.95 [2024-08-01 00:31:01,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28536.17 | bwd: 40543.60 | bwd_inner: 39499.01 | bwd_allreduce: 1044.12 | step: 183.53 92%|█████████▏| 1136/1230 [22:19:07<1:49:25, 69.85s/it] {'loss': 1.1564, 'learning_rate': 3.0480765141939316e-07, 'epoch': 0.92} 92%|█████████▏| 1136/1230 [22:19:07<1:49:25, 69.85s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2384 [2024-08-01 00:31:10,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.62 | bwd_microstep: 5457.56 | bwd_inner_microstep: 5037.84 | bwd_allreduce_microstep: 419.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2129 [2024-08-01 00:31:19,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.16 | bwd_microstep: 5287.48 | bwd_inner_microstep: 4878.84 | bwd_allreduce_microstep: 408.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2270 [2024-08-01 00:31:28,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.44 | bwd_microstep: 5423.65 | bwd_inner_microstep: 5004.62 | bwd_allreduce_microstep: 418.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3889 [2024-08-01 00:31:37,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.47 | bwd_microstep: 5124.68 | bwd_inner_microstep: 5105.40 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3633 [2024-08-01 00:31:46,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.94 | bwd_microstep: 5194.19 | bwd_inner_microstep: 5090.25 | bwd_allreduce_microstep: 103.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-08-01 00:31:55,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.56 | bwd_microstep: 5161.68 | bwd_inner_microstep: 5108.75 | bwd_allreduce_microstep: 52.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2207 [2024-08-01 00:32:03,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3036.04 | bwd_microstep: 4974.15 | bwd_inner_microstep: 4589.59 | bwd_allreduce_microstep: 384.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-08-01 00:32:11,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-08-01 00:32:11,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.99 | bwd_microstep: 4733.56 | bwd_inner_microstep: 4710.71 | bwd_allreduce_microstep: 22.78 | step_microstep: 181.50 [2024-08-01 00:32:11,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28019.12 | bwd: 41356.94 | bwd_inner: 39525.94 | bwd_allreduce: 1830.51 | step: 182.07 92%|█████████▏| 1137/1230 [22:20:17<1:48:12, 69.81s/it] {'loss': 1.1543, 'learning_rate': 2.9838915422577887e-07, 'epoch': 0.92} 92%|█████████▏| 1137/1230 [22:20:17<1:48:12, 69.81s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3950 [2024-08-01 00:32:20,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.91 | bwd_microstep: 5507.44 | bwd_inner_microstep: 5424.36 | bwd_allreduce_microstep: 83.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-08-01 00:32:29,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3918.71 | bwd_microstep: 5138.36 | bwd_inner_microstep: 5117.59 | bwd_allreduce_microstep: 20.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3761 [2024-08-01 00:32:38,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.19 | bwd_microstep: 5179.13 | bwd_inner_microstep: 5099.82 | bwd_allreduce_microstep: 79.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-08-01 00:32:47,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.62 | bwd_microstep: 5239.72 | bwd_inner_microstep: 5145.27 | bwd_allreduce_microstep: 94.38 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-08-01 00:32:56,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.32 | bwd_microstep: 5185.20 | bwd_inner_microstep: 4783.56 | bwd_allreduce_microstep: 401.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3693 [2024-08-01 00:33:04,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3326.23 | bwd_microstep: 4917.79 | bwd_inner_microstep: 4876.66 | bwd_allreduce_microstep: 41.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-08-01 00:33:13,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.58 | bwd_microstep: 4877.64 | bwd_inner_microstep: 4856.72 | bwd_allreduce_microstep: 20.85 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2899 [2024-08-01 00:33:21,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-08-01 00:33:21,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.71 | bwd_microstep: 5094.28 | bwd_inner_microstep: 4695.92 | bwd_allreduce_microstep: 398.30 | step_microstep: 181.57 [2024-08-01 00:33:21,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28998.22 | bwd: 41139.55 | bwd_inner: 39999.83 | bwd_allreduce: 1139.23 | step: 182.14 93%|█████████▎| 1138/1230 [22:21:27<1:47:20, 70.01s/it] {'loss': 1.1087, 'learning_rate': 2.920379333877221e-07, 'epoch': 0.93} 93%|█████████▎| 1138/1230 [22:21:27<1:47:20, 70.01s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2426 [2024-08-01 00:33:30,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.04 | bwd_microstep: 5229.62 | bwd_inner_microstep: 4825.53 | bwd_allreduce_microstep: 404.02 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3784 [2024-08-01 00:33:39,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.23 | bwd_microstep: 5182.66 | bwd_inner_microstep: 5105.38 | bwd_allreduce_microstep: 77.21 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3835 [2024-08-01 00:33:48,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.62 | bwd_microstep: 5185.56 | bwd_inner_microstep: 5116.68 | bwd_allreduce_microstep: 68.82 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-08-01 00:33:57,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.49 | bwd_microstep: 5197.72 | bwd_inner_microstep: 4792.98 | bwd_allreduce_microstep: 404.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-08-01 00:34:05,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.46 | bwd_microstep: 5178.08 | bwd_inner_microstep: 4775.05 | bwd_allreduce_microstep: 402.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3860 [2024-08-01 00:34:14,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3291.09 | bwd_microstep: 4910.06 | bwd_inner_microstep: 4890.76 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-08-01 00:34:22,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.88 | bwd_microstep: 5065.09 | bwd_inner_microstep: 4673.85 | bwd_allreduce_microstep: 391.17 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3666 [2024-08-01 00:34:31,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-08-01 00:34:31,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.90 | bwd_microstep: 5000.62 | bwd_inner_microstep: 4930.22 | bwd_allreduce_microstep: 70.33 | step_microstep: 181.65 [2024-08-01 00:34:31,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28271.62 | bwd: 40949.40 | bwd_inner: 39110.39 | bwd_allreduce: 1838.52 | step: 182.35 93%|█████████▎| 1139/1230 [22:22:37<1:45:58, 69.87s/it] {'loss': 1.1275, 'learning_rate': 2.8575403294811123e-07, 'epoch': 0.93} 93%|█████████▎| 1139/1230 [22:22:37<1:45:58, 69.87s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3527 [2024-08-01 00:34:40,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.02 | bwd_microstep: 5219.13 | bwd_inner_microstep: 5123.35 | bwd_allreduce_microstep: 95.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3863 [2024-08-01 00:34:49,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3783.75 | bwd_microstep: 5114.22 | bwd_inner_microstep: 5094.97 | bwd_allreduce_microstep: 19.18 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2625 [2024-08-01 00:34:58,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.29 | bwd_microstep: 5226.04 | bwd_inner_microstep: 4819.38 | bwd_allreduce_microstep: 406.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3604 [2024-08-01 00:35:06,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.72 | bwd_microstep: 5126.52 | bwd_inner_microstep: 5036.57 | bwd_allreduce_microstep: 89.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-08-01 00:35:15,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.60 | bwd_microstep: 5126.23 | bwd_inner_microstep: 5072.59 | bwd_allreduce_microstep: 53.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-08-01 00:35:24,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.50 | bwd_microstep: 5177.90 | bwd_inner_microstep: 5097.84 | bwd_allreduce_microstep: 79.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-08-01 00:35:33,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.92 | bwd_microstep: 5124.51 | bwd_inner_microstep: 5071.66 | bwd_allreduce_microstep: 52.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-08-01 00:35:42,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-08-01 00:35:42,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.46 | bwd_microstep: 5180.12 | bwd_inner_microstep: 5102.26 | bwd_allreduce_microstep: 77.79 | step_microstep: 182.48 [2024-08-01 00:35:42,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29163.17 | bwd: 41294.64 | bwd_inner: 40418.56 | bwd_allreduce: 875.61 | step: 183.06 93%|█████████▎| 1140/1230 [22:23:48<1:45:13, 70.14s/it] {'loss': 1.1603, 'learning_rate': 2.795374964830022e-07, 'epoch': 0.93} 93%|█████████▎| 1140/1230 [22:23:48<1:45:13, 70.14s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3874 [2024-08-01 00:35:51,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3823.58 | bwd_microstep: 5265.20 | bwd_inner_microstep: 5229.21 | bwd_allreduce_microstep: 35.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3770 [2024-08-01 00:36:00,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3807.43 | bwd_microstep: 5307.29 | bwd_inner_microstep: 5253.27 | bwd_allreduce_microstep: 53.95 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2843 [2024-08-01 00:36:09,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.05 | bwd_microstep: 5252.68 | bwd_inner_microstep: 4844.87 | bwd_allreduce_microstep: 407.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-08-01 00:36:18,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.25 | bwd_microstep: 5195.11 | bwd_inner_microstep: 5135.39 | bwd_allreduce_microstep: 59.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-08-01 00:36:26,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3064.79 | bwd_microstep: 5029.87 | bwd_inner_microstep: 4642.20 | bwd_allreduce_microstep: 387.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-08-01 00:36:35,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.85 | bwd_microstep: 5118.81 | bwd_inner_microstep: 5051.74 | bwd_allreduce_microstep: 67.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3777 [2024-08-01 00:36:43,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.57 | bwd_microstep: 5027.04 | bwd_inner_microstep: 5007.71 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-08-01 00:36:52,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-08-01 00:36:52,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.24 | bwd_microstep: 5090.15 | bwd_inner_microstep: 4692.98 | bwd_allreduce_microstep: 397.10 | step_microstep: 181.80 [2024-08-01 00:36:52,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28725.66 | bwd: 41286.13 | bwd_inner: 39857.30 | bwd_allreduce: 1428.35 | step: 182.38 93%|█████████▎| 1141/1230 [22:24:58<1:44:08, 70.20s/it] {'loss': 1.1318, 'learning_rate': 2.733883671013082e-07, 'epoch': 0.93} 93%|█████████▎| 1141/1230 [22:24:58<1:44:08, 70.20s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-08-01 00:37:01,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.27 | bwd_microstep: 5243.28 | bwd_inner_microstep: 5220.34 | bwd_allreduce_microstep: 22.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3813 [2024-08-01 00:37:10,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.15 | bwd_microstep: 5303.78 | bwd_inner_microstep: 5236.99 | bwd_allreduce_microstep: 66.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-08-01 00:37:19,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.46 | bwd_microstep: 5167.49 | bwd_inner_microstep: 5092.32 | bwd_allreduce_microstep: 75.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-08-01 00:37:28,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.67 | bwd_microstep: 5204.60 | bwd_inner_microstep: 5117.39 | bwd_allreduce_microstep: 87.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-08-01 00:37:37,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.24 | bwd_microstep: 5247.04 | bwd_inner_microstep: 4841.01 | bwd_allreduce_microstep: 405.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-08-01 00:37:45,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.62 | bwd_microstep: 5238.11 | bwd_inner_microstep: 4833.25 | bwd_allreduce_microstep: 404.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3678 [2024-08-01 00:37:54,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.39 | bwd_microstep: 4868.10 | bwd_inner_microstep: 4848.69 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-08-01 00:38:03,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-08-01 00:38:03,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.81 | bwd_microstep: 5045.52 | bwd_inner_microstep: 4979.81 | bwd_allreduce_microstep: 65.64 | step_microstep: 181.17 [2024-08-01 00:38:03,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28960.53 | bwd: 41317.90 | bwd_inner: 40169.74 | bwd_allreduce: 1147.68 | step: 181.74 93%|█████████▎| 1142/1230 [22:26:09<1:43:08, 70.33s/it] {'loss': 1.1621, 'learning_rate': 2.673066874445096e-07, 'epoch': 0.93} 93%|█████████▎| 1142/1230 [22:26:09<1:43:08, 70.33s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4028 [2024-08-01 00:38:12,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.72 | bwd_microstep: 5582.37 | bwd_inner_microstep: 5505.73 | bwd_allreduce_microstep: 76.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3566 [2024-08-01 00:38:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.50 | bwd_microstep: 5188.22 | bwd_inner_microstep: 5103.79 | bwd_allreduce_microstep: 84.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-08-01 00:38:30,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.58 | bwd_microstep: 5185.09 | bwd_inner_microstep: 5099.19 | bwd_allreduce_microstep: 85.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-08-01 00:38:39,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.83 | bwd_microstep: 5154.77 | bwd_inner_microstep: 5100.22 | bwd_allreduce_microstep: 54.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-08-01 00:38:47,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.46 | bwd_microstep: 5040.39 | bwd_inner_microstep: 5011.15 | bwd_allreduce_microstep: 29.17 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3601 [2024-08-01 00:38:56,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.38 | bwd_microstep: 5133.08 | bwd_inner_microstep: 5069.38 | bwd_allreduce_microstep: 63.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-08-01 00:39:05,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.25 | bwd_microstep: 4878.77 | bwd_inner_microstep: 4859.38 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-08-01 00:39:13,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-08-01 00:39:13,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.88 | bwd_microstep: 4871.51 | bwd_inner_microstep: 4851.66 | bwd_allreduce_microstep: 19.78 | step_microstep: 183.54 [2024-08-01 00:39:13,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29347.50 | bwd: 41034.19 | bwd_inner: 40600.45 | bwd_allreduce: 433.25 | step: 184.12 93%|█████████▎| 1143/1230 [22:27:19<1:42:08, 70.44s/it] {'loss': 1.1318, 'learning_rate': 2.612924996863453e-07, 'epoch': 0.93} 93%|█████████▎| 1143/1230 [22:27:19<1:42:08, 70.44s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2328 [2024-08-01 00:39:22,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3121.06 | bwd_microstep: 5219.62 | bwd_inner_microstep: 4820.78 | bwd_allreduce_microstep: 398.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3854 [2024-08-01 00:39:31,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.54 | bwd_microstep: 5196.33 | bwd_inner_microstep: 5145.24 | bwd_allreduce_microstep: 51.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2252 [2024-08-01 00:39:39,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.40 | bwd_microstep: 5123.85 | bwd_inner_microstep: 4724.63 | bwd_allreduce_microstep: 399.14 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-08-01 00:39:48,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.35 | bwd_microstep: 5007.21 | bwd_inner_microstep: 4983.07 | bwd_allreduce_microstep: 24.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-08-01 00:39:57,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.34 | bwd_microstep: 5145.31 | bwd_inner_microstep: 5090.09 | bwd_allreduce_microstep: 55.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-08-01 00:40:06,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.71 | bwd_microstep: 5160.11 | bwd_inner_microstep: 5077.69 | bwd_allreduce_microstep: 82.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-08-01 00:40:14,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.88 | bwd_microstep: 5115.95 | bwd_inner_microstep: 5045.76 | bwd_allreduce_microstep: 70.12 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3657 [2024-08-01 00:40:23,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-08-01 00:40:23,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.72 | bwd_microstep: 5127.35 | bwd_inner_microstep: 5041.32 | bwd_allreduce_microstep: 85.96 | step_microstep: 182.53 [2024-08-01 00:40:23,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28393.89 | bwd: 41095.70 | bwd_inner: 39928.52 | bwd_allreduce: 1166.71 | step: 183.21 93%|█████████▎| 1144/1230 [22:28:29<1:40:42, 70.26s/it] {'loss': 1.1347, 'learning_rate': 2.5534584553253526e-07, 'epoch': 0.93} 93%|█████████▎| 1144/1230 [22:28:29<1:40:42, 70.26s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-08-01 00:40:32,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.47 | bwd_microstep: 5205.70 | bwd_inner_microstep: 5183.08 | bwd_allreduce_microstep: 22.56 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2318 [2024-08-01 00:40:41,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.60 | bwd_microstep: 5271.36 | bwd_inner_microstep: 4866.02 | bwd_allreduce_microstep: 405.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2253 [2024-08-01 00:40:50,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.41 | bwd_microstep: 5273.70 | bwd_inner_microstep: 4867.51 | bwd_allreduce_microstep: 406.12 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3653 [2024-08-01 00:40:59,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.79 | bwd_microstep: 5175.22 | bwd_inner_microstep: 5081.58 | bwd_allreduce_microstep: 93.58 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3619 [2024-08-01 00:41:07,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.20 | bwd_microstep: 5104.01 | bwd_inner_microstep: 5022.05 | bwd_allreduce_microstep: 81.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-08-01 00:41:16,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.10 | bwd_microstep: 5114.09 | bwd_inner_microstep: 5048.42 | bwd_allreduce_microstep: 65.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2143 [2024-08-01 00:41:25,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.56 | bwd_microstep: 5088.44 | bwd_inner_microstep: 4691.68 | bwd_allreduce_microstep: 396.70 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-08-01 00:41:33,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-08-01 00:41:33,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3478.04 | bwd_microstep: 5060.22 | bwd_inner_microstep: 4667.79 | bwd_allreduce_microstep: 392.37 | step_microstep: 181.60 [2024-08-01 00:41:33,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28560.06 | bwd: 41292.73 | bwd_inner: 39428.06 | bwd_allreduce: 1864.19 | step: 182.18 93%|█████████▎| 1145/1230 [22:29:39<1:39:30, 70.24s/it] {'loss': 1.1108, 'learning_rate': 2.494667662204786e-07, 'epoch': 0.93} 93%|█████████▎| 1145/1230 [22:29:39<1:39:30, 70.24s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2392 [2024-08-01 00:41:42,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.02 | bwd_microstep: 5338.14 | bwd_inner_microstep: 4929.07 | bwd_allreduce_microstep: 409.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3788 [2024-08-01 00:41:51,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3278.37 | bwd_microstep: 4857.56 | bwd_inner_microstep: 4834.77 | bwd_allreduce_microstep: 22.71 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3765 [2024-08-01 00:41:59,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.83 | bwd_microstep: 5180.40 | bwd_inner_microstep: 5115.42 | bwd_allreduce_microstep: 64.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-08-01 00:42:07,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3198.85 | bwd_microstep: 4805.55 | bwd_inner_microstep: 4766.39 | bwd_allreduce_microstep: 39.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-08-01 00:42:16,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.69 | bwd_microstep: 4977.56 | bwd_inner_microstep: 4958.21 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-08-01 00:42:25,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.64 | bwd_microstep: 4999.87 | bwd_inner_microstep: 4941.24 | bwd_allreduce_microstep: 58.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-08-01 00:42:33,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.80 | bwd_microstep: 4987.53 | bwd_inner_microstep: 4934.28 | bwd_allreduce_microstep: 53.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-08-01 00:42:42,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-08-01 00:42:42,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.47 | bwd_microstep: 4990.92 | bwd_inner_microstep: 4941.21 | bwd_allreduce_microstep: 49.65 | step_microstep: 181.34 [2024-08-01 00:42:42,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28039.59 | bwd: 40137.50 | bwd_inner: 39420.54 | bwd_allreduce: 716.49 | step: 181.91 93%|█████████▎| 1146/1230 [22:30:48<1:37:36, 69.72s/it] {'loss': 1.1349, 'learning_rate': 2.4365530251897693e-07, 'epoch': 0.93} 93%|█████████▎| 1146/1230 [22:30:48<1:37:36, 69.72s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3872 [2024-08-01 00:42:51,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3811.29 | bwd_microstep: 5207.41 | bwd_inner_microstep: 5180.86 | bwd_allreduce_microstep: 26.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2229 [2024-08-01 00:43:00,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.41 | bwd_microstep: 5349.77 | bwd_inner_microstep: 4937.01 | bwd_allreduce_microstep: 412.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-08-01 00:43:09,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.00 | bwd_microstep: 5000.94 | bwd_inner_microstep: 4981.53 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-08-01 00:43:17,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.61 | bwd_microstep: 4977.17 | bwd_inner_microstep: 4957.84 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-08-01 00:43:26,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.61 | bwd_microstep: 5013.10 | bwd_inner_microstep: 4971.13 | bwd_allreduce_microstep: 41.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-08-01 00:43:35,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.62 | bwd_microstep: 5156.71 | bwd_inner_microstep: 5086.39 | bwd_allreduce_microstep: 70.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-08-01 00:43:44,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.07 | bwd_microstep: 5142.90 | bwd_inner_microstep: 4746.04 | bwd_allreduce_microstep: 396.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-08-01 00:43:52,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-08-01 00:43:52,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.00 | bwd_microstep: 5002.53 | bwd_inner_microstep: 4949.47 | bwd_allreduce_microstep: 53.00 | step_microstep: 181.49 [2024-08-01 00:43:52,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29309.53 | bwd: 40850.51 | bwd_inner: 39810.20 | bwd_allreduce: 1039.83 | step: 182.07 93%|█████████▎| 1147/1230 [22:31:58<1:36:45, 69.95s/it] {'loss': 1.119, 'learning_rate': 2.3791149472794261e-07, 'epoch': 0.93} 93%|█████████▎| 1147/1230 [22:31:58<1:36:45, 69.95s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2425 [2024-08-01 00:44:01,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.64 | bwd_microstep: 5323.68 | bwd_inner_microstep: 4912.93 | bwd_allreduce_microstep: 410.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3795 [2024-08-01 00:44:10,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.18 | bwd_microstep: 5228.43 | bwd_inner_microstep: 5167.85 | bwd_allreduce_microstep: 60.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-08-01 00:44:19,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.94 | bwd_microstep: 5105.52 | bwd_inner_microstep: 5039.35 | bwd_allreduce_microstep: 66.10 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3709 [2024-08-01 00:44:28,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.72 | bwd_microstep: 5144.51 | bwd_inner_microstep: 5053.83 | bwd_allreduce_microstep: 90.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-08-01 00:44:36,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.33 | bwd_microstep: 5066.85 | bwd_inner_microstep: 5004.72 | bwd_allreduce_microstep: 62.07 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2110 [2024-08-01 00:44:44,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3012.06 | bwd_microstep: 4896.64 | bwd_inner_microstep: 4523.45 | bwd_allreduce_microstep: 373.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-08-01 00:44:53,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.36 | bwd_microstep: 5011.73 | bwd_inner_microstep: 4987.13 | bwd_allreduce_microstep: 24.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2121 [2024-08-01 00:45:02,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-08-01 00:45:02,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.49 | bwd_microstep: 5217.95 | bwd_inner_microstep: 4811.31 | bwd_allreduce_microstep: 406.56 | step_microstep: 181.73 [2024-08-01 00:45:02,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28214.63 | bwd: 40995.30 | bwd_inner: 39500.52 | bwd_allreduce: 1494.29 | step: 182.30 93%|█████████▎| 1148/1230 [22:33:08<1:35:25, 69.83s/it] {'loss': 1.1273, 'learning_rate': 2.3223538267813317e-07, 'epoch': 0.93} 93%|█████████▎| 1148/1230 [22:33:08<1:35:25, 69.83s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3946 [2024-08-01 00:45:11,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.45 | bwd_microstep: 5458.38 | bwd_inner_microstep: 5384.08 | bwd_allreduce_microstep: 74.23 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2289 [2024-08-01 00:45:20,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.72 | bwd_microstep: 5345.09 | bwd_inner_microstep: 4932.02 | bwd_allreduce_microstep: 413.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3790 [2024-08-01 00:45:29,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.68 | bwd_microstep: 5132.28 | bwd_inner_microstep: 5088.16 | bwd_allreduce_microstep: 44.05 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-08-01 00:45:38,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.88 | bwd_microstep: 5126.99 | bwd_inner_microstep: 5059.42 | bwd_allreduce_microstep: 67.52 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3640 [2024-08-01 00:45:47,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.24 | bwd_microstep: 5331.51 | bwd_inner_microstep: 5241.69 | bwd_allreduce_microstep: 89.75 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-08-01 00:45:55,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3193.87 | bwd_microstep: 4721.54 | bwd_inner_microstep: 4697.72 | bwd_allreduce_microstep: 23.75 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-08-01 00:46:03,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.38 | bwd_microstep: 5158.34 | bwd_inner_microstep: 4757.35 | bwd_allreduce_microstep: 400.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3938 [2024-08-01 00:46:12,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.79 [2024-08-01 00:46:12,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.67 | bwd_microstep: 5155.38 | bwd_inner_microstep: 5118.40 | bwd_allreduce_microstep: 36.92 | step_microstep: 183.04 [2024-08-01 00:46:12,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28545.78 | bwd: 41429.50 | bwd_inner: 40278.77 | bwd_allreduce: 1150.25 | step: 183.73 93%|█████████▎| 1149/1230 [22:34:18<1:34:27, 69.97s/it] {'loss': 1.1704, 'learning_rate': 2.2662700573085505e-07, 'epoch': 0.93} 93%|█████████▎| 1149/1230 [22:34:18<1:34:27, 69.97s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3907 [2024-08-01 00:46:21,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.09 | bwd_microstep: 5142.30 | bwd_inner_microstep: 5123.17 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3929 [2024-08-01 00:46:30,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.37 | bwd_microstep: 5070.19 | bwd_inner_microstep: 5043.98 | bwd_allreduce_microstep: 26.14 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2825 [2024-08-01 00:46:39,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.11 | bwd_microstep: 5176.77 | bwd_inner_microstep: 4774.72 | bwd_allreduce_microstep: 401.98 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3772 [2024-08-01 00:46:48,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.58 | bwd_microstep: 5001.47 | bwd_inner_microstep: 4982.10 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1082 [2024-08-01 00:46:56,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.41 | bwd_microstep: 5246.83 | bwd_inner_microstep: 4842.82 | bwd_allreduce_microstep: 403.95 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2097 [2024-08-01 00:47:05,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.27 | bwd_microstep: 5232.34 | bwd_inner_microstep: 4825.56 | bwd_allreduce_microstep: 406.70 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2174 [2024-08-01 00:47:14,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.21 | bwd_microstep: 5079.26 | bwd_inner_microstep: 4685.03 | bwd_allreduce_microstep: 394.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-08-01 00:47:23,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-08-01 00:47:23,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.07 | bwd_microstep: 5153.19 | bwd_inner_microstep: 5081.11 | bwd_allreduce_microstep: 72.01 | step_microstep: 181.45 [2024-08-01 00:47:23,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28879.03 | bwd: 41102.32 | bwd_inner: 39358.43 | bwd_allreduce: 1743.40 | step: 182.03 93%|█████████▎| 1150/1230 [22:35:29<1:33:25, 70.07s/it] {'loss': 1.1416, 'learning_rate': 2.2108640277771153e-07, 'epoch': 0.93} 93%|█████████▎| 1150/1230 [22:35:29<1:33:25, 70.07s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4019 [2024-08-01 00:47:32,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3849.25 | bwd_microstep: 5236.11 | bwd_inner_microstep: 5217.02 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3840 [2024-08-01 00:47:41,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.55 | bwd_microstep: 5300.08 | bwd_inner_microstep: 5235.83 | bwd_allreduce_microstep: 64.19 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3735 [2024-08-01 00:47:50,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.81 | bwd_microstep: 5184.89 | bwd_inner_microstep: 5117.91 | bwd_allreduce_microstep: 66.91 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1204 [2024-08-01 00:47:58,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3011.93 | bwd_microstep: 5030.65 | bwd_inner_microstep: 4644.56 | bwd_allreduce_microstep: 386.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-08-01 00:48:06,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.22 | bwd_microstep: 5001.07 | bwd_inner_microstep: 4978.17 | bwd_allreduce_microstep: 22.84 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2100 [2024-08-01 00:48:15,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3478.45 | bwd_microstep: 5061.27 | bwd_inner_microstep: 4669.70 | bwd_allreduce_microstep: 391.50 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2109 [2024-08-01 00:48:24,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3328.60 | bwd_microstep: 5465.58 | bwd_inner_microstep: 4886.21 | bwd_allreduce_microstep: 579.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-08-01 00:48:33,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-08-01 00:48:33,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.28 | bwd_microstep: 4969.06 | bwd_inner_microstep: 4937.04 | bwd_allreduce_microstep: 31.95 | step_microstep: 182.13 [2024-08-01 00:48:33,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28387.99 | bwd: 41248.68 | bwd_inner: 39686.38 | bwd_allreduce: 1561.82 | step: 182.70 94%|█████████▎| 1151/1230 [22:36:39<1:32:13, 70.04s/it] {'loss': 1.1566, 'learning_rate': 2.156136122403163e-07, 'epoch': 0.94} 94%|█████████▎| 1151/1230 [22:36:39<1:32:13, 70.04s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-08-01 00:48:42,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3866.00 | bwd_microstep: 5347.87 | bwd_inner_microstep: 5328.68 | bwd_allreduce_microstep: 19.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3559 [2024-08-01 00:48:51,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.38 | bwd_microstep: 5310.38 | bwd_inner_microstep: 5204.12 | bwd_allreduce_microstep: 106.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-08-01 00:49:00,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.51 | bwd_microstep: 5222.12 | bwd_inner_microstep: 5134.09 | bwd_allreduce_microstep: 87.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-08-01 00:49:08,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.74 | bwd_microstep: 5201.90 | bwd_inner_microstep: 4795.61 | bwd_allreduce_microstep: 406.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-08-01 00:49:17,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.46 | bwd_microstep: 5195.66 | bwd_inner_microstep: 4791.25 | bwd_allreduce_microstep: 404.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-08-01 00:49:26,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.60 | bwd_microstep: 5058.02 | bwd_inner_microstep: 4997.78 | bwd_allreduce_microstep: 60.17 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-08-01 00:49:35,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.22 | bwd_microstep: 4894.57 | bwd_inner_microstep: 4875.23 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-08-01 00:49:43,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-08-01 00:49:43,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.32 | bwd_microstep: 5120.58 | bwd_inner_microstep: 4722.33 | bwd_allreduce_microstep: 398.18 | step_microstep: 181.49 [2024-08-01 00:49:43,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29066.14 | bwd: 41351.07 | bwd_inner: 39849.04 | bwd_allreduce: 1501.54 | step: 182.06 94%|█████████▎| 1152/1230 [22:37:49<1:31:19, 70.25s/it] {'loss': 1.0761, 'learning_rate': 2.1020867207004026e-07, 'epoch': 0.94} 94%|█████████▎| 1152/1230 [22:37:49<1:31:19, 70.25s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2349 [2024-08-01 00:49:53,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.15 | bwd_microstep: 5540.32 | bwd_inner_microstep: 5111.73 | bwd_allreduce_microstep: 428.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2235 [2024-08-01 00:50:01,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3016.05 | bwd_microstep: 5014.05 | bwd_inner_microstep: 4625.61 | bwd_allreduce_microstep: 388.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-08-01 00:50:09,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.08 | bwd_microstep: 5126.30 | bwd_inner_microstep: 5055.27 | bwd_allreduce_microstep: 70.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-08-01 00:50:18,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.74 | bwd_microstep: 4987.36 | bwd_inner_microstep: 4968.04 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-08-01 00:50:27,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.62 | bwd_microstep: 5025.53 | bwd_inner_microstep: 4970.82 | bwd_allreduce_microstep: 54.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-08-01 00:50:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.26 | bwd_microstep: 4991.08 | bwd_inner_microstep: 4971.73 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-08-01 00:50:44,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.00 | bwd_microstep: 5073.34 | bwd_inner_microstep: 5009.92 | bwd_allreduce_microstep: 63.35 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3653 [2024-08-01 00:50:53,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-08-01 00:50:53,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.13 | bwd_microstep: 5051.11 | bwd_inner_microstep: 4977.65 | bwd_allreduce_microstep: 73.38 | step_microstep: 183.11 [2024-08-01 00:50:53,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28532.93 | bwd: 40809.08 | bwd_inner: 39690.72 | bwd_allreduce: 1117.88 | step: 183.68 94%|█████████▎| 1153/1230 [22:38:59<1:29:56, 70.08s/it] {'loss': 1.0678, 'learning_rate': 2.048716197477374e-07, 'epoch': 0.94} 94%|█████████▎| 1153/1230 [22:38:59<1:29:56, 70.08s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4012 [2024-08-01 00:51:02,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.51 | bwd_microstep: 5558.86 | bwd_inner_microstep: 5489.50 | bwd_allreduce_microstep: 69.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3950 [2024-08-01 00:51:11,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.35 | bwd_microstep: 5181.97 | bwd_inner_microstep: 5162.65 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3908 [2024-08-01 00:51:20,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.16 | bwd_microstep: 5152.17 | bwd_inner_microstep: 5132.82 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2282 [2024-08-01 00:51:29,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.60 | bwd_microstep: 5083.94 | bwd_inner_microstep: 4688.61 | bwd_allreduce_microstep: 395.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-08-01 00:51:38,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.94 | bwd_microstep: 5058.16 | bwd_inner_microstep: 5016.25 | bwd_allreduce_microstep: 41.84 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3706 [2024-08-01 00:51:46,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.89 | bwd_microstep: 5151.49 | bwd_inner_microstep: 5073.51 | bwd_allreduce_microstep: 77.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-08-01 00:51:55,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3342.75 | bwd_microstep: 4900.61 | bwd_inner_microstep: 4862.47 | bwd_allreduce_microstep: 38.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-08-01 00:52:04,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-08-01 00:52:04,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.98 | bwd_microstep: 5087.31 | bwd_inner_microstep: 5023.69 | bwd_allreduce_microstep: 63.56 | step_microstep: 181.40 [2024-08-01 00:52:04,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29074.08 | bwd: 41174.49 | bwd_inner: 40449.44 | bwd_allreduce: 724.58 | step: 181.98 94%|█████████▍| 1154/1230 [22:40:10<1:28:57, 70.23s/it] {'loss': 1.1022, 'learning_rate': 1.996024922834916e-07, 'epoch': 0.94} 94%|█████████▍| 1154/1230 [22:40:10<1:28:57, 70.23s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3900 [2024-08-01 00:52:13,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.59 | bwd_microstep: 5329.75 | bwd_inner_microstep: 5251.00 | bwd_allreduce_microstep: 78.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-08-01 00:52:21,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.30 | bwd_microstep: 5166.08 | bwd_inner_microstep: 5112.83 | bwd_allreduce_microstep: 53.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3583 [2024-08-01 00:52:30,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.84 | bwd_microstep: 5223.67 | bwd_inner_microstep: 5130.17 | bwd_allreduce_microstep: 93.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-08-01 00:52:39,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.85 | bwd_microstep: 5072.19 | bwd_inner_microstep: 5028.00 | bwd_allreduce_microstep: 44.11 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3792 [2024-08-01 00:52:48,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.77 | bwd_microstep: 5109.93 | bwd_inner_microstep: 5066.21 | bwd_allreduce_microstep: 43.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-08-01 00:52:56,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.56 | bwd_microstep: 4754.64 | bwd_inner_microstep: 4728.36 | bwd_allreduce_microstep: 26.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-08-01 00:53:04,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.06 | bwd_microstep: 5073.08 | bwd_inner_microstep: 5009.55 | bwd_allreduce_microstep: 63.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-08-01 00:53:13,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.89 [2024-08-01 00:53:13,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.20 | bwd_microstep: 4889.83 | bwd_inner_microstep: 4870.43 | bwd_allreduce_microstep: 19.33 | step_microstep: 183.23 [2024-08-01 00:53:13,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28576.07 | bwd: 40619.15 | bwd_inner: 40196.51 | bwd_allreduce: 422.14 | step: 183.91 94%|█████████▍| 1155/1230 [22:41:19<1:27:31, 70.02s/it] {'loss': 1.107, 'learning_rate': 1.9440132621635687e-07, 'epoch': 0.94} 94%|█████████▍| 1155/1230 [22:41:19<1:27:31, 70.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-08-01 00:53:22,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.28 | bwd_microstep: 5318.77 | bwd_inner_microstep: 5289.87 | bwd_allreduce_microstep: 28.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3912 [2024-08-01 00:53:31,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3810.27 | bwd_microstep: 5177.85 | bwd_inner_microstep: 5157.53 | bwd_allreduce_microstep: 20.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-08-01 00:53:40,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.47 | bwd_microstep: 4988.01 | bwd_inner_microstep: 4968.67 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2201 [2024-08-01 00:53:49,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.12 | bwd_microstep: 5125.56 | bwd_inner_microstep: 4729.95 | bwd_allreduce_microstep: 395.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-08-01 00:53:57,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.40 | bwd_microstep: 4957.36 | bwd_inner_microstep: 4929.41 | bwd_allreduce_microstep: 27.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-08-01 00:54:06,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.51 | bwd_microstep: 5217.70 | bwd_inner_microstep: 5130.06 | bwd_allreduce_microstep: 87.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-08-01 00:54:15,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.59 | bwd_microstep: 5000.82 | bwd_inner_microstep: 4948.88 | bwd_allreduce_microstep: 51.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2143 [2024-08-01 00:54:23,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-08-01 00:54:23,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.60 | bwd_microstep: 5134.34 | bwd_inner_microstep: 4736.53 | bwd_allreduce_microstep: 397.74 | step_microstep: 181.11 [2024-08-01 00:54:23,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28990.13 | bwd: 40920.39 | bwd_inner: 39890.84 | bwd_allreduce: 1029.07 | step: 181.70 94%|█████████▍| 1156/1230 [22:42:29<1:26:26, 70.09s/it] {'loss': 1.125, 'learning_rate': 1.8926815761410867e-07, 'epoch': 0.94} 94%|█████████▍| 1156/1230 [22:42:29<1:26:26, 70.09s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4082 [2024-08-01 00:54:32,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.53 | bwd_microstep: 5404.60 | bwd_inner_microstep: 5364.08 | bwd_allreduce_microstep: 40.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-08-01 00:54:41,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3121.72 | bwd_microstep: 5290.71 | bwd_inner_microstep: 4885.42 | bwd_allreduce_microstep: 405.23 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3714 [2024-08-01 00:54:49,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3126.91 | bwd_microstep: 5041.55 | bwd_inner_microstep: 4992.69 | bwd_allreduce_microstep: 48.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-08-01 00:54:58,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.02 | bwd_microstep: 5158.99 | bwd_inner_microstep: 5076.68 | bwd_allreduce_microstep: 82.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3776 [2024-08-01 00:55:07,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.43 | bwd_microstep: 5014.22 | bwd_inner_microstep: 4994.89 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3767 [2024-08-01 00:55:15,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.97 | bwd_microstep: 5019.44 | bwd_inner_microstep: 4992.34 | bwd_allreduce_microstep: 27.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-08-01 00:55:24,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.07 | bwd_microstep: 5097.62 | bwd_inner_microstep: 5034.49 | bwd_allreduce_microstep: 63.07 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3679 [2024-08-01 00:55:33,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-08-01 00:55:33,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.68 | bwd_microstep: 5017.79 | bwd_inner_microstep: 4955.16 | bwd_allreduce_microstep: 62.57 | step_microstep: 181.56 [2024-08-01 00:55:33,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28102.23 | bwd: 41044.91 | bwd_inner: 40295.68 | bwd_allreduce: 748.75 | step: 182.14 94%|█████████▍| 1157/1230 [22:43:39<1:25:02, 69.90s/it] {'loss': 1.1162, 'learning_rate': 1.8420302207298623e-07, 'epoch': 0.94} 94%|█████████▍| 1157/1230 [22:43:39<1:25:02, 69.90s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3788 [2024-08-01 00:55:42,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.81 | bwd_microstep: 5297.35 | bwd_inner_microstep: 5233.39 | bwd_allreduce_microstep: 63.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-08-01 00:55:51,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.15 | bwd_microstep: 5177.89 | bwd_inner_microstep: 4777.00 | bwd_allreduce_microstep: 400.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3626 [2024-08-01 00:56:00,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.99 | bwd_microstep: 5319.47 | bwd_inner_microstep: 5170.71 | bwd_allreduce_microstep: 148.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-08-01 00:56:08,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3323.62 | bwd_microstep: 4951.94 | bwd_inner_microstep: 4903.39 | bwd_allreduce_microstep: 48.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-08-01 00:56:17,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.49 | bwd_microstep: 4982.39 | bwd_inner_microstep: 4962.96 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-08-01 00:56:25,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.03 | bwd_microstep: 5107.82 | bwd_inner_microstep: 4711.09 | bwd_allreduce_microstep: 396.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-08-01 00:56:34,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.41 | bwd_microstep: 5121.78 | bwd_inner_microstep: 4724.30 | bwd_allreduce_microstep: 397.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-08-01 00:56:43,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-08-01 00:56:43,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.91 | bwd_microstep: 5048.97 | bwd_inner_microstep: 4989.81 | bwd_allreduce_microstep: 59.09 | step_microstep: 182.05 [2024-08-01 00:56:43,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28533.32 | bwd: 41007.59 | bwd_inner: 39472.60 | bwd_allreduce: 1534.51 | step: 182.63 94%|█████████▍| 1158/1230 [22:44:49<1:23:52, 69.90s/it] {'loss': 1.1517, 'learning_rate': 1.792059547174507e-07, 'epoch': 0.94} 94%|█████████▍| 1158/1230 [22:44:49<1:23:52, 69.90s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3880 [2024-08-01 00:56:52,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3692.27 | bwd_microstep: 5366.27 | bwd_inner_microstep: 5299.73 | bwd_allreduce_microstep: 66.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2292 [2024-08-01 00:57:01,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.57 | bwd_microstep: 5277.74 | bwd_inner_microstep: 4869.45 | bwd_allreduce_microstep: 408.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3861 [2024-08-01 00:57:10,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.36 | bwd_microstep: 5112.51 | bwd_inner_microstep: 5093.08 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-08-01 00:57:18,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.08 | bwd_microstep: 5041.86 | bwd_inner_microstep: 5019.86 | bwd_allreduce_microstep: 21.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-08-01 00:57:27,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.41 | bwd_microstep: 5127.84 | bwd_inner_microstep: 5078.89 | bwd_allreduce_microstep: 48.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-08-01 00:57:36,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.14 | bwd_microstep: 4905.46 | bwd_inner_microstep: 4879.85 | bwd_allreduce_microstep: 25.54 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3699 [2024-08-01 00:57:45,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.97 | bwd_microstep: 5347.94 | bwd_inner_microstep: 5185.12 | bwd_allreduce_microstep: 162.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-08-01 00:57:54,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-08-01 00:57:54,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.51 | bwd_microstep: 5079.84 | bwd_inner_microstep: 5022.49 | bwd_allreduce_microstep: 57.29 | step_microstep: 181.92 [2024-08-01 00:57:54,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29171.22 | bwd: 41259.45 | bwd_inner: 40448.41 | bwd_allreduce: 810.56 | step: 182.50 94%|█████████▍| 1159/1230 [22:45:59<1:23:01, 70.16s/it] {'loss': 1.1779, 'learning_rate': 1.7427699019994304e-07, 'epoch': 0.94} 94%|█████████▍| 1159/1230 [22:45:59<1:23:01, 70.16s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4077 [2024-08-01 00:58:02,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.46 | bwd_microstep: 5176.45 | bwd_inner_microstep: 5157.41 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3810 [2024-08-01 00:58:11,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.93 | bwd_microstep: 5158.56 | bwd_inner_microstep: 5110.48 | bwd_allreduce_microstep: 48.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3804 [2024-08-01 00:58:20,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.00 | bwd_microstep: 5162.53 | bwd_inner_microstep: 5095.02 | bwd_allreduce_microstep: 67.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-08-01 00:58:29,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.01 | bwd_microstep: 5218.83 | bwd_inner_microstep: 5132.22 | bwd_allreduce_microstep: 86.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-08-01 00:58:38,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.33 | bwd_microstep: 5009.27 | bwd_inner_microstep: 4987.84 | bwd_allreduce_microstep: 21.37 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3687 [2024-08-01 00:58:46,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.20 | bwd_microstep: 4923.00 | bwd_inner_microstep: 4887.64 | bwd_allreduce_microstep: 35.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-08-01 00:58:55,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.42 | bwd_microstep: 5042.92 | bwd_inner_microstep: 4651.04 | bwd_allreduce_microstep: 391.82 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2924 [2024-08-01 00:59:04,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-08-01 00:59:04,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.85 | bwd_microstep: 5042.71 | bwd_inner_microstep: 4717.18 | bwd_allreduce_microstep: 325.46 | step_microstep: 182.03 [2024-08-01 00:59:04,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28958.09 | bwd: 40734.25 | bwd_inner: 39738.77 | bwd_allreduce: 995.01 | step: 182.60 94%|█████████▍| 1160/1230 [22:47:09<1:21:48, 70.12s/it] {'loss': 1.1355, 'learning_rate': 1.6941616270063965e-07, 'epoch': 0.94} 94%|█████████▍| 1160/1230 [22:47:09<1:21:48, 70.12s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3956 [2024-08-01 00:59:13,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3854.59 | bwd_microstep: 5480.50 | bwd_inner_microstep: 5422.26 | bwd_allreduce_microstep: 58.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3574 [2024-08-01 00:59:22,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.44 | bwd_microstep: 5323.33 | bwd_inner_microstep: 5219.77 | bwd_allreduce_microstep: 103.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-08-01 00:59:31,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.30 | bwd_microstep: 5037.86 | bwd_inner_microstep: 5014.96 | bwd_allreduce_microstep: 22.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-08-01 00:59:40,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.16 | bwd_microstep: 5395.20 | bwd_inner_microstep: 5311.02 | bwd_allreduce_microstep: 84.11 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3713 [2024-08-01 00:59:49,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.14 | bwd_microstep: 5068.24 | bwd_inner_microstep: 5037.86 | bwd_allreduce_microstep: 30.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-08-01 00:59:57,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.56 | bwd_microstep: 5111.27 | bwd_inner_microstep: 5043.79 | bwd_allreduce_microstep: 67.41 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2168 [2024-08-01 01:00:06,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.84 | bwd_microstep: 5165.94 | bwd_inner_microstep: 4763.68 | bwd_allreduce_microstep: 402.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-08-01 01:00:15,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-08-01 01:00:15,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.29 | bwd_microstep: 5157.75 | bwd_inner_microstep: 5082.76 | bwd_allreduce_microstep: 74.92 | step_microstep: 182.05 [2024-08-01 01:00:15,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29318.22 | bwd: 41740.04 | bwd_inner: 40896.03 | bwd_allreduce: 843.50 | step: 182.64 94%|█████████▍| 1161/1230 [22:48:21<1:21:04, 70.50s/it] {'loss': 1.141, 'learning_rate': 1.6462350592721498e-07, 'epoch': 0.94} 94%|█████████▍| 1161/1230 [22:48:21<1:21:04, 70.50s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4087 [2024-08-01 01:00:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3879.51 | bwd_microstep: 5435.77 | bwd_inner_microstep: 5416.76 | bwd_allreduce_microstep: 18.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-08-01 01:00:33,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.58 | bwd_microstep: 5237.99 | bwd_inner_microstep: 5143.09 | bwd_allreduce_microstep: 94.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-08-01 01:00:42,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.19 | bwd_microstep: 5197.83 | bwd_inner_microstep: 5112.60 | bwd_allreduce_microstep: 85.16 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3732 [2024-08-01 01:00:51,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.41 | bwd_microstep: 5152.30 | bwd_inner_microstep: 5074.28 | bwd_allreduce_microstep: 77.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-08-01 01:01:00,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.23 | bwd_microstep: 5277.21 | bwd_inner_microstep: 4868.31 | bwd_allreduce_microstep: 408.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-08-01 01:01:08,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.20 | bwd_microstep: 5010.35 | bwd_inner_microstep: 4952.27 | bwd_allreduce_microstep: 58.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-08-01 01:01:17,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.47 | bwd_microstep: 4886.86 | bwd_inner_microstep: 4867.58 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3664 [2024-08-01 01:01:25,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-08-01 01:01:25,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3086.01 | bwd_microstep: 4844.35 | bwd_inner_microstep: 4804.43 | bwd_allreduce_microstep: 39.85 | step_microstep: 182.52 [2024-08-01 01:01:25,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28640.52 | bwd: 41042.64 | bwd_inner: 40239.26 | bwd_allreduce: 802.90 | step: 183.20 94%|█████████▍| 1162/1230 [22:49:31<1:19:44, 70.35s/it] {'loss': 1.0878, 'learning_rate': 1.5989905311461274e-07, 'epoch': 0.94} 94%|█████████▍| 1162/1230 [22:49:31<1:19:44, 70.35s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2471 [2024-08-01 01:01:34,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.12 | bwd_microstep: 5367.14 | bwd_inner_microstep: 4955.67 | bwd_allreduce_microstep: 411.40 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3907 [2024-08-01 01:01:43,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.72 | bwd_microstep: 5092.28 | bwd_inner_microstep: 5057.94 | bwd_allreduce_microstep: 34.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3779 [2024-08-01 01:01:52,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.25 | bwd_microstep: 5238.36 | bwd_inner_microstep: 5177.27 | bwd_allreduce_microstep: 61.02 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2086 [2024-08-01 01:02:00,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3264.83 | bwd_microstep: 5082.01 | bwd_inner_microstep: 4688.20 | bwd_allreduce_microstep: 393.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3863 [2024-08-01 01:02:09,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.33 | bwd_microstep: 5077.46 | bwd_inner_microstep: 5041.31 | bwd_allreduce_microstep: 36.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-08-01 01:02:17,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3023.99 | bwd_microstep: 4955.44 | bwd_inner_microstep: 4573.69 | bwd_allreduce_microstep: 381.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-08-01 01:02:25,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.06 | bwd_microstep: 5232.80 | bwd_inner_microstep: 4824.48 | bwd_allreduce_microstep: 408.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-08-01 01:02:34,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-08-01 01:02:34,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.36 | bwd_microstep: 5100.51 | bwd_inner_microstep: 4705.17 | bwd_allreduce_microstep: 395.28 | step_microstep: 181.54 [2024-08-01 01:02:34,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27905.56 | bwd: 41145.98 | bwd_inner: 39023.66 | bwd_allreduce: 2121.84 | step: 182.13 95%|█████████▍| 1163/1230 [22:50:40<1:18:14, 70.06s/it] {'loss': 1.1254, 'learning_rate': 1.5524283702481158e-07, 'epoch': 0.95} 95%|█████████▍| 1163/1230 [22:50:40<1:18:14, 70.06s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4020 [2024-08-01 01:02:43,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.07 | bwd_microstep: 5099.73 | bwd_inner_microstep: 5076.88 | bwd_allreduce_microstep: 22.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3854 [2024-08-01 01:02:52,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.16 | bwd_microstep: 5116.67 | bwd_inner_microstep: 5097.36 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-08-01 01:03:01,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.32 | bwd_microstep: 5282.14 | bwd_inner_microstep: 4871.36 | bwd_allreduce_microstep: 410.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-08-01 01:03:10,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.89 | bwd_microstep: 5058.76 | bwd_inner_microstep: 5014.93 | bwd_allreduce_microstep: 43.76 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2207 [2024-08-01 01:03:19,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.15 | bwd_microstep: 5188.22 | bwd_inner_microstep: 4781.38 | bwd_allreduce_microstep: 406.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-08-01 01:03:27,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.40 | bwd_microstep: 4934.78 | bwd_inner_microstep: 4912.75 | bwd_allreduce_microstep: 21.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-08-01 01:03:36,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.38 | bwd_microstep: 5093.60 | bwd_inner_microstep: 5030.73 | bwd_allreduce_microstep: 62.80 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2162 [2024-08-01 01:03:45,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-08-01 01:03:45,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.30 | bwd_microstep: 5194.72 | bwd_inner_microstep: 4790.57 | bwd_allreduce_microstep: 404.08 | step_microstep: 209.48 [2024-08-01 01:03:45,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29189.57 | bwd: 40968.59 | bwd_inner: 39575.90 | bwd_allreduce: 1392.19 | step: 210.05 95%|█████████▍| 1164/1230 [22:51:51<1:17:13, 70.20s/it] {'loss': 1.16, 'learning_rate': 1.5065488994659983e-07, 'epoch': 0.95} 95%|█████████▍| 1164/1230 [22:51:51<1:17:13, 70.20s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 4096 [2024-08-01 01:03:53,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3337.88 | bwd_microstep: 5119.55 | bwd_inner_microstep: 5100.42 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3583 [2024-08-01 01:04:02,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.90 | bwd_microstep: 5005.60 | bwd_inner_microstep: 4931.26 | bwd_allreduce_microstep: 74.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-08-01 01:04:10,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3219.64 | bwd_microstep: 4804.48 | bwd_inner_microstep: 4783.79 | bwd_allreduce_microstep: 20.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-08-01 01:04:19,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.59 | bwd_microstep: 5214.57 | bwd_inner_microstep: 5157.63 | bwd_allreduce_microstep: 56.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-08-01 01:04:28,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.06 | bwd_microstep: 5202.77 | bwd_inner_microstep: 5121.17 | bwd_allreduce_microstep: 81.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-08-01 01:04:36,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3035.31 | bwd_microstep: 4911.29 | bwd_inner_microstep: 4532.35 | bwd_allreduce_microstep: 378.87 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2150 [2024-08-01 01:04:44,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3478.07 | bwd_microstep: 5072.08 | bwd_inner_microstep: 4678.80 | bwd_allreduce_microstep: 393.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-08-01 01:04:53,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-08-01 01:04:53,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.27 | bwd_microstep: 5025.13 | bwd_inner_microstep: 4985.79 | bwd_allreduce_microstep: 39.27 | step_microstep: 181.55 [2024-08-01 01:04:53,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27458.63 | bwd: 40355.45 | bwd_inner: 39291.16 | bwd_allreduce: 1063.79 | step: 182.12 95%|█████████▍| 1165/1230 [22:52:59<1:15:22, 69.58s/it] {'loss': 1.1375, 'learning_rate': 1.461352436953478e-07, 'epoch': 0.95} 95%|█████████▍| 1165/1230 [22:52:59<1:15:22, 69.58s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3907 [2024-08-01 01:05:02,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3816.06 | bwd_microstep: 5182.04 | bwd_inner_microstep: 5161.36 | bwd_allreduce_microstep: 20.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3875 [2024-08-01 01:05:11,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3809.56 | bwd_microstep: 5177.81 | bwd_inner_microstep: 5149.28 | bwd_allreduce_microstep: 28.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-08-01 01:05:20,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.34 | bwd_microstep: 5085.28 | bwd_inner_microstep: 5035.86 | bwd_allreduce_microstep: 49.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-08-01 01:05:29,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.42 | bwd_microstep: 5155.74 | bwd_inner_microstep: 4751.72 | bwd_allreduce_microstep: 403.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3794 [2024-08-01 01:05:37,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.43 | bwd_microstep: 5026.62 | bwd_inner_microstep: 5007.29 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-08-01 01:05:46,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.39 | bwd_microstep: 5051.94 | bwd_inner_microstep: 4996.84 | bwd_allreduce_microstep: 55.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-08-01 01:05:55,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.90 | bwd_microstep: 5003.20 | bwd_inner_microstep: 4983.89 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2156 [2024-08-01 01:06:04,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-08-01 01:06:04,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.81 | bwd_microstep: 5127.43 | bwd_inner_microstep: 4730.20 | bwd_allreduce_microstep: 397.16 | step_microstep: 181.56 [2024-08-01 01:06:04,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29607.82 | bwd: 40810.04 | bwd_inner: 39816.39 | bwd_allreduce: 993.16 | step: 182.14 95%|█████████▍| 1166/1230 [22:54:10<1:14:35, 69.93s/it] {'loss': 1.1231, 'learning_rate': 1.4168392961279254e-07, 'epoch': 0.95} 95%|█████████▍| 1166/1230 [22:54:10<1:14:35, 69.93s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4079 [2024-08-01 01:06:13,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.34 | bwd_microstep: 5405.16 | bwd_inner_microstep: 5369.29 | bwd_allreduce_microstep: 35.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3882 [2024-08-01 01:06:22,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3658.50 | bwd_microstep: 5204.87 | bwd_inner_microstep: 5151.51 | bwd_allreduce_microstep: 53.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3828 [2024-08-01 01:06:31,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.16 | bwd_microstep: 5056.66 | bwd_inner_microstep: 5037.34 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3767 [2024-08-01 01:06:39,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.03 | bwd_microstep: 5179.60 | bwd_inner_microstep: 5092.95 | bwd_allreduce_microstep: 86.58 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2270 [2024-08-01 01:06:48,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3063.40 | bwd_microstep: 5034.70 | bwd_inner_microstep: 4643.79 | bwd_allreduce_microstep: 390.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-08-01 01:06:56,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.94 | bwd_microstep: 5260.77 | bwd_inner_microstep: 5174.95 | bwd_allreduce_microstep: 85.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-08-01 01:07:05,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.69 | bwd_microstep: 5015.66 | bwd_inner_microstep: 4961.41 | bwd_allreduce_microstep: 54.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3693 [2024-08-01 01:07:14,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-08-01 01:07:14,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.90 | bwd_microstep: 4999.62 | bwd_inner_microstep: 4934.53 | bwd_allreduce_microstep: 65.00 | step_microstep: 183.08 [2024-08-01 01:07:14,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28557.85 | bwd: 41157.02 | bwd_inner: 40365.71 | bwd_allreduce: 790.82 | step: 183.66 95%|█████████▍| 1167/1230 [22:55:20<1:13:27, 69.97s/it] {'loss': 1.1206, 'learning_rate': 1.3730097856681668e-07, 'epoch': 0.95} 95%|█████████▍| 1167/1230 [22:55:20<1:13:27, 69.97s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3536 [2024-08-01 01:07:23,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.48 | bwd_microstep: 5210.61 | bwd_inner_microstep: 5116.76 | bwd_allreduce_microstep: 93.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3841 [2024-08-01 01:07:32,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.57 | bwd_microstep: 5098.83 | bwd_inner_microstep: 5079.56 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-08-01 01:07:40,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.27 | bwd_microstep: 5128.84 | bwd_inner_microstep: 5049.82 | bwd_allreduce_microstep: 78.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3818 [2024-08-01 01:07:49,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3797.01 | bwd_microstep: 5238.49 | bwd_inner_microstep: 5197.79 | bwd_allreduce_microstep: 40.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-08-01 01:07:58,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.35 | bwd_microstep: 5024.61 | bwd_inner_microstep: 4999.75 | bwd_allreduce_microstep: 24.79 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-08-01 01:08:07,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.62 | bwd_microstep: 5112.97 | bwd_inner_microstep: 5044.23 | bwd_allreduce_microstep: 68.67 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-08-01 01:08:15,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.28 | bwd_microstep: 4894.11 | bwd_inner_microstep: 4872.52 | bwd_allreduce_microstep: 21.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-08-01 01:08:24,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-08-01 01:08:24,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3196.93 | bwd_microstep: 4725.38 | bwd_inner_microstep: 4700.45 | bwd_allreduce_microstep: 24.86 | step_microstep: 181.84 [2024-08-01 01:08:24,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29037.42 | bwd: 40433.81 | bwd_inner: 40060.81 | bwd_allreduce: 372.52 | step: 182.52 95%|█████████▍| 1168/1230 [22:56:29<1:12:15, 69.92s/it] {'loss': 1.1146, 'learning_rate': 1.3298642095123882e-07, 'epoch': 0.95} 95%|█████████▍| 1168/1230 [22:56:29<1:12:15, 69.92s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2364 [2024-08-01 01:08:33,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.51 | bwd_microstep: 5364.34 | bwd_inner_microstep: 4953.33 | bwd_allreduce_microstep: 410.94 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2253 [2024-08-01 01:08:41,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.59 | bwd_microstep: 5157.85 | bwd_inner_microstep: 4757.50 | bwd_allreduce_microstep: 400.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3921 [2024-08-01 01:08:50,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.16 | bwd_microstep: 5119.07 | bwd_inner_microstep: 5083.03 | bwd_allreduce_microstep: 35.96 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-08-01 01:08:59,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.90 | bwd_microstep: 5018.05 | bwd_inner_microstep: 4981.59 | bwd_allreduce_microstep: 36.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-08-01 01:09:07,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.90 | bwd_microstep: 5063.37 | bwd_inner_microstep: 5001.47 | bwd_allreduce_microstep: 61.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-08-01 01:09:16,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.89 | bwd_microstep: 5174.54 | bwd_inner_microstep: 5119.44 | bwd_allreduce_microstep: 55.03 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2161 [2024-08-01 01:09:25,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.67 | bwd_microstep: 5234.19 | bwd_inner_microstep: 4824.82 | bwd_allreduce_microstep: 409.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-08-01 01:09:34,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-08-01 01:09:34,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.73 | bwd_microstep: 5181.71 | bwd_inner_microstep: 5107.45 | bwd_allreduce_microstep: 74.19 | step_microstep: 182.51 [2024-08-01 01:09:34,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28852.25 | bwd: 41313.11 | bwd_inner: 39828.58 | bwd_allreduce: 1484.05 | step: 183.12 95%|█████████▌| 1169/1230 [22:57:40<1:11:15, 70.09s/it] {'loss': 1.1339, 'learning_rate': 1.2874028668559136e-07, 'epoch': 0.95} 95%|█████████▌| 1169/1230 [22:57:40<1:11:15, 70.09s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3830 [2024-08-01 01:09:43,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3811.53 | bwd_microstep: 5361.29 | bwd_inner_microstep: 5304.36 | bwd_allreduce_microstep: 56.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3807 [2024-08-01 01:09:52,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.83 | bwd_microstep: 5173.01 | bwd_inner_microstep: 5118.80 | bwd_allreduce_microstep: 54.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3778 [2024-08-01 01:10:01,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.49 | bwd_microstep: 5011.69 | bwd_inner_microstep: 4992.32 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-08-01 01:10:09,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.15 | bwd_microstep: 5040.73 | bwd_inner_microstep: 4977.99 | bwd_allreduce_microstep: 62.67 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-08-01 01:10:18,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.99 | bwd_microstep: 4984.24 | bwd_inner_microstep: 4964.80 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-08-01 01:10:27,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.85 | bwd_microstep: 4888.90 | bwd_inner_microstep: 4869.56 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-08-01 01:10:35,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3318.43 | bwd_microstep: 4889.07 | bwd_inner_microstep: 4844.91 | bwd_allreduce_microstep: 44.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3665 [2024-08-01 01:10:44,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.74 [2024-08-01 01:10:44,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.62 | bwd_microstep: 4907.67 | bwd_inner_microstep: 4888.29 | bwd_allreduce_microstep: 19.31 | step_microstep: 181.59 [2024-08-01 01:10:44,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29177.80 | bwd: 40256.59 | bwd_inner: 39960.99 | bwd_allreduce: 295.11 | step: 182.17 95%|█████████▌| 1170/1230 [22:58:50<1:09:59, 70.00s/it] {'loss': 1.2054, 'learning_rate': 1.245626052149318e-07, 'epoch': 0.95} 95%|█████████▌| 1170/1230 [22:58:50<1:09:59, 70.00s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3571 [2024-08-01 01:10:53,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.53 | bwd_microstep: 5489.74 | bwd_inner_microstep: 5293.05 | bwd_allreduce_microstep: 196.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-08-01 01:11:02,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.82 | bwd_microstep: 5060.90 | bwd_inner_microstep: 5032.87 | bwd_allreduce_microstep: 27.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-08-01 01:11:11,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.60 | bwd_microstep: 5203.08 | bwd_inner_microstep: 5150.17 | bwd_allreduce_microstep: 52.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-08-01 01:11:20,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.23 | bwd_microstep: 5225.88 | bwd_inner_microstep: 4820.13 | bwd_allreduce_microstep: 405.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-08-01 01:11:27,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3193.52 | bwd_microstep: 4692.90 | bwd_inner_microstep: 4665.42 | bwd_allreduce_microstep: 27.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-08-01 01:11:36,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.93 | bwd_microstep: 5033.84 | bwd_inner_microstep: 4977.30 | bwd_allreduce_microstep: 56.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-08-01 01:11:45,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.72 | bwd_microstep: 5032.69 | bwd_inner_microstep: 4977.62 | bwd_allreduce_microstep: 55.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3651 [2024-08-01 01:11:53,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-08-01 01:11:53,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.83 | bwd_microstep: 5039.59 | bwd_inner_microstep: 4965.68 | bwd_allreduce_microstep: 73.84 | step_microstep: 182.04 [2024-08-01 01:11:53,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28486.10 | bwd: 40778.59 | bwd_inner: 39882.20 | bwd_allreduce: 895.92 | step: 182.62 95%|█████████▌| 1171/1230 [22:59:59<1:08:42, 69.88s/it] {'loss': 1.1727, 'learning_rate': 1.2045340550961958e-07, 'epoch': 0.95} 95%|█████████▌| 1171/1230 [22:59:59<1:08:42, 69.88s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2135 [2024-08-01 01:12:02,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.28 | bwd_microstep: 5347.22 | bwd_inner_microstep: 4934.62 | bwd_allreduce_microstep: 412.53 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2028 [2024-08-01 01:12:11,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.52 | bwd_microstep: 5231.64 | bwd_inner_microstep: 4828.52 | bwd_allreduce_microstep: 403.05 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3824 [2024-08-01 01:12:20,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.71 | bwd_microstep: 5190.57 | bwd_inner_microstep: 5124.62 | bwd_allreduce_microstep: 65.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-08-01 01:12:29,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.56 | bwd_microstep: 5213.30 | bwd_inner_microstep: 4809.80 | bwd_allreduce_microstep: 403.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-08-01 01:12:38,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.77 | bwd_microstep: 5244.06 | bwd_inner_microstep: 4835.58 | bwd_allreduce_microstep: 408.41 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2891 [2024-08-01 01:12:46,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.10 | bwd_microstep: 5094.31 | bwd_inner_microstep: 4760.81 | bwd_allreduce_microstep: 333.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2142 [2024-08-01 01:12:55,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.18 | bwd_microstep: 5066.97 | bwd_inner_microstep: 4673.84 | bwd_allreduce_microstep: 393.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-08-01 01:13:04,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-08-01 01:13:04,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.57 | bwd_microstep: 4883.48 | bwd_inner_microstep: 4864.14 | bwd_allreduce_microstep: 19.28 | step_microstep: 181.67 [2024-08-01 01:13:04,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28508.59 | bwd: 41271.54 | bwd_inner: 38831.87 | bwd_allreduce: 2439.18 | step: 182.26 95%|█████████▌| 1172/1230 [23:01:09<1:07:36, 69.95s/it] {'loss': 1.1806, 'learning_rate': 1.164127160651285e-07, 'epoch': 0.95} 95%|█████████▌| 1172/1230 [23:01:09<1:07:36, 69.95s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-08-01 01:13:13,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3861.20 | bwd_microstep: 5346.80 | bwd_inner_microstep: 5327.75 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3793 [2024-08-01 01:13:22,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.22 | bwd_microstep: 5174.76 | bwd_inner_microstep: 5106.50 | bwd_allreduce_microstep: 68.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3831 [2024-08-01 01:13:30,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3367.85 | bwd_microstep: 5048.21 | bwd_inner_microstep: 4991.31 | bwd_allreduce_microstep: 56.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-08-01 01:13:39,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.91 | bwd_microstep: 5229.90 | bwd_inner_microstep: 4823.00 | bwd_allreduce_microstep: 406.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-08-01 01:13:47,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.09 | bwd_microstep: 5036.71 | bwd_inner_microstep: 4975.06 | bwd_allreduce_microstep: 61.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-08-01 01:13:56,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3229.43 | bwd_microstep: 4854.74 | bwd_inner_microstep: 4806.79 | bwd_allreduce_microstep: 47.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3658 [2024-08-01 01:14:04,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.67 | bwd_microstep: 5072.59 | bwd_inner_microstep: 5023.24 | bwd_allreduce_microstep: 49.29 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3686 [2024-08-01 01:14:13,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-08-01 01:14:13,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3092.57 | bwd_microstep: 4879.18 | bwd_inner_microstep: 4834.91 | bwd_allreduce_microstep: 44.20 | step_microstep: 181.20 [2024-08-01 01:14:13,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28042.86 | bwd: 40642.88 | bwd_inner: 39888.49 | bwd_allreduce: 753.89 | step: 181.78 95%|█████████▌| 1173/1230 [23:02:18<1:06:11, 69.67s/it] {'loss': 1.1502, 'learning_rate': 1.1244056490184008e-07, 'epoch': 0.95} 95%|█████████▌| 1173/1230 [23:02:18<1:06:11, 69.67s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-08-01 01:14:22,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.83 | bwd_microstep: 5328.36 | bwd_inner_microstep: 5262.07 | bwd_allreduce_microstep: 66.22 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3167 [2024-08-01 01:14:30,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.34 | bwd_microstep: 5187.07 | bwd_inner_microstep: 4905.22 | bwd_allreduce_microstep: 281.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-08-01 01:14:39,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.79 | bwd_microstep: 5263.09 | bwd_inner_microstep: 5170.86 | bwd_allreduce_microstep: 92.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-08-01 01:14:48,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.63 | bwd_microstep: 4977.86 | bwd_inner_microstep: 4958.55 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3723 [2024-08-01 01:14:56,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3124.73 | bwd_microstep: 4921.92 | bwd_inner_microstep: 4884.32 | bwd_allreduce_microstep: 37.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-08-01 01:15:05,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.90 | bwd_microstep: 4986.72 | bwd_inner_microstep: 4933.18 | bwd_allreduce_microstep: 53.48 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3704 [2024-08-01 01:15:13,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.83 | bwd_microstep: 5028.03 | bwd_inner_microstep: 4964.20 | bwd_allreduce_microstep: 63.76 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-08-01 01:15:22,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-08-01 01:15:22,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.43 | bwd_microstep: 5060.41 | bwd_inner_microstep: 4998.60 | bwd_allreduce_microstep: 61.74 | step_microstep: 182.55 [2024-08-01 01:15:22,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28358.38 | bwd: 40753.44 | bwd_inner: 40076.94 | bwd_allreduce: 676.00 | step: 183.15 95%|█████████▌| 1174/1230 [23:03:28<1:04:57, 69.60s/it] {'loss': 1.1204, 'learning_rate': 1.0853697956485942e-07, 'epoch': 0.95} 95%|█████████▌| 1174/1230 [23:03:28<1:04:57, 69.60s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3823 [2024-08-01 01:15:31,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.79 | bwd_microstep: 5224.96 | bwd_inner_microstep: 5183.33 | bwd_allreduce_microstep: 41.56 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3864 [2024-08-01 01:15:40,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3787.87 | bwd_microstep: 5107.15 | bwd_inner_microstep: 5087.83 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3798 [2024-08-01 01:15:49,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.69 | bwd_microstep: 5120.65 | bwd_inner_microstep: 5058.57 | bwd_allreduce_microstep: 62.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-08-01 01:15:57,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.20 | bwd_microstep: 5186.75 | bwd_inner_microstep: 5127.50 | bwd_allreduce_microstep: 59.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-08-01 01:16:06,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.88 | bwd_microstep: 5032.53 | bwd_inner_microstep: 5009.57 | bwd_allreduce_microstep: 22.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-08-01 01:16:15,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.00 | bwd_microstep: 5052.83 | bwd_inner_microstep: 4989.21 | bwd_allreduce_microstep: 63.54 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3673 [2024-08-01 01:16:24,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.28 | bwd_microstep: 5166.67 | bwd_inner_microstep: 5077.84 | bwd_allreduce_microstep: 88.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-08-01 01:16:33,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-08-01 01:16:33,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.68 | bwd_microstep: 5067.40 | bwd_inner_microstep: 5006.86 | bwd_allreduce_microstep: 60.48 | step_microstep: 182.02 [2024-08-01 01:16:33,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29202.30 | bwd: 40958.92 | bwd_inner: 40540.65 | bwd_allreduce: 417.78 | step: 182.72 96%|█████████▌| 1175/1230 [23:04:38<1:04:02, 69.87s/it] {'loss': 1.1163, 'learning_rate': 1.0470198712381086e-07, 'epoch': 0.96} 96%|█████████▌| 1175/1230 [23:04:38<1:04:02, 69.87s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3701 [2024-08-01 01:16:42,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.92 | bwd_microstep: 5506.76 | bwd_inner_microstep: 5370.55 | bwd_allreduce_microstep: 136.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3804 [2024-08-01 01:16:51,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.04 | bwd_microstep: 5146.34 | bwd_inner_microstep: 5120.11 | bwd_allreduce_microstep: 26.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2247 [2024-08-01 01:16:59,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.06 | bwd_microstep: 5224.20 | bwd_inner_microstep: 4818.80 | bwd_allreduce_microstep: 405.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-08-01 01:17:08,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.41 | bwd_microstep: 5267.50 | bwd_inner_microstep: 5209.00 | bwd_allreduce_microstep: 58.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-08-01 01:17:17,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.54 | bwd_microstep: 5290.40 | bwd_inner_microstep: 4878.92 | bwd_allreduce_microstep: 411.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-08-01 01:17:26,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.44 | bwd_microstep: 5023.69 | bwd_inner_microstep: 5000.36 | bwd_allreduce_microstep: 23.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3736 [2024-08-01 01:17:35,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.96 | bwd_microstep: 5110.41 | bwd_inner_microstep: 5035.79 | bwd_allreduce_microstep: 74.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-08-01 01:17:43,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-08-01 01:17:43,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3436.34 | bwd_microstep: 5017.40 | bwd_inner_microstep: 4629.94 | bwd_allreduce_microstep: 387.39 | step_microstep: 183.37 [2024-08-01 01:17:43,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28980.62 | bwd: 41586.68 | bwd_inner: 40063.39 | bwd_allreduce: 1522.82 | step: 183.94 96%|█████████▌| 1176/1230 [23:05:49<1:03:09, 70.18s/it] {'loss': 1.142, 'learning_rate': 1.009356141726625e-07, 'epoch': 0.96} 96%|█████████▌| 1176/1230 [23:05:49<1:03:09, 70.18s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3559 [2024-08-01 01:17:52,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.91 | bwd_microstep: 5123.27 | bwd_inner_microstep: 5014.77 | bwd_allreduce_microstep: 108.44 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2043 [2024-08-01 01:18:00,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3048.09 | bwd_microstep: 5060.83 | bwd_inner_microstep: 4669.39 | bwd_allreduce_microstep: 391.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3783 [2024-08-01 01:18:09,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.00 | bwd_microstep: 5014.72 | bwd_inner_microstep: 4995.38 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3699 [2024-08-01 01:18:18,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.64 | bwd_microstep: 5176.39 | bwd_inner_microstep: 5087.83 | bwd_allreduce_microstep: 88.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-08-01 01:18:27,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.61 | bwd_microstep: 5032.02 | bwd_inner_microstep: 5009.43 | bwd_allreduce_microstep: 22.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-08-01 01:18:36,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.45 | bwd_microstep: 4976.72 | bwd_inner_microstep: 4957.32 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-08-01 01:18:44,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3486.48 | bwd_microstep: 5062.52 | bwd_inner_microstep: 4668.60 | bwd_allreduce_microstep: 393.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3682 [2024-08-01 01:18:53,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-08-01 01:18:53,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.47 | bwd_microstep: 4900.37 | bwd_inner_microstep: 4881.01 | bwd_allreduce_microstep: 19.29 | step_microstep: 181.44 [2024-08-01 01:18:53,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28800.55 | bwd: 40346.82 | bwd_inner: 39283.67 | bwd_allreduce: 1062.67 | step: 182.01 96%|█████████▌| 1177/1230 [23:06:59<1:01:48, 69.97s/it] {'loss': 1.1344, 'learning_rate': 9.723788682953428e-08, 'epoch': 0.96} 96%|█████████▌| 1177/1230 [23:06:59<1:01:48, 69.97s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4058 [2024-08-01 01:19:01,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3367.81 | bwd_microstep: 5134.33 | bwd_inner_microstep: 5115.24 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3961 [2024-08-01 01:19:10,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3402.60 | bwd_microstep: 5066.27 | bwd_inner_microstep: 5019.71 | bwd_allreduce_microstep: 46.49 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3755 [2024-08-01 01:19:19,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.13 | bwd_microstep: 5197.20 | bwd_inner_microstep: 5156.66 | bwd_allreduce_microstep: 40.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-08-01 01:19:27,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.25 | bwd_microstep: 5103.69 | bwd_inner_microstep: 5059.21 | bwd_allreduce_microstep: 44.42 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1072 [2024-08-01 01:19:36,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.87 | bwd_microstep: 5180.18 | bwd_inner_microstep: 4782.58 | bwd_allreduce_microstep: 397.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-08-01 01:19:45,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.99 | bwd_microstep: 5224.21 | bwd_inner_microstep: 5106.75 | bwd_allreduce_microstep: 117.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-08-01 01:19:54,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.62 | bwd_microstep: 5059.35 | bwd_inner_microstep: 4999.99 | bwd_allreduce_microstep: 59.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-08-01 01:20:03,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-08-01 01:20:03,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.84 | bwd_microstep: 4993.53 | bwd_inner_microstep: 4938.88 | bwd_allreduce_microstep: 54.58 | step_microstep: 182.80 [2024-08-01 01:20:03,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28352.02 | bwd: 40958.75 | bwd_inner: 40178.96 | bwd_allreduce: 779.31 | step: 183.38 96%|█████████▌| 1178/1230 [23:08:08<1:00:33, 69.87s/it] {'loss': 1.1607, 'learning_rate': 9.360883073652238e-08, 'epoch': 0.96} 96%|█████████▌| 1178/1230 [23:08:08<1:00:33, 69.87s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4043 [2024-08-01 01:20:12,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3847.81 | bwd_microstep: 5321.55 | bwd_inner_microstep: 5302.44 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2238 [2024-08-01 01:20:20,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.83 | bwd_microstep: 5173.06 | bwd_inner_microstep: 4771.65 | bwd_allreduce_microstep: 401.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-08-01 01:20:29,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.28 | bwd_microstep: 5158.09 | bwd_inner_microstep: 5082.42 | bwd_allreduce_microstep: 75.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-08-01 01:20:38,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.51 | bwd_microstep: 5163.59 | bwd_inner_microstep: 4761.47 | bwd_allreduce_microstep: 402.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-08-01 01:20:46,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3184.69 | bwd_microstep: 4793.80 | bwd_inner_microstep: 4774.46 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-08-01 01:20:55,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 4009.55 | bwd_microstep: 4900.34 | bwd_inner_microstep: 4878.72 | bwd_allreduce_microstep: 21.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3824 [2024-08-01 01:21:04,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.88 | bwd_microstep: 5063.14 | bwd_inner_microstep: 5043.86 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2134 [2024-08-01 01:21:13,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-08-01 01:21:13,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.85 | bwd_microstep: 5166.62 | bwd_inner_microstep: 4763.84 | bwd_allreduce_microstep: 402.72 | step_microstep: 181.41 [2024-08-01 01:21:13,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28953.29 | bwd: 40740.18 | bwd_inner: 39378.81 | bwd_allreduce: 1360.89 | step: 181.99 96%|█████████▌| 1179/1230 [23:09:18<59:25, 69.92s/it] {'loss': 1.15, 'learning_rate': 9.004847105951509e-08, 'epoch': 0.96} 96%|█████████▌| 1179/1230 [23:09:18<59:25, 69.92s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-08-01 01:21:22,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3850.53 | bwd_microstep: 5328.23 | bwd_inner_microstep: 5308.98 | bwd_allreduce_microstep: 19.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-08-01 01:21:31,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3796.82 | bwd_microstep: 5171.65 | bwd_inner_microstep: 5150.06 | bwd_allreduce_microstep: 21.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3804 [2024-08-01 01:21:40,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.08 | bwd_microstep: 5065.46 | bwd_inner_microstep: 5042.96 | bwd_allreduce_microstep: 22.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3793 [2024-08-01 01:21:48,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.31 | bwd_microstep: 5211.57 | bwd_inner_microstep: 5158.33 | bwd_allreduce_microstep: 53.17 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-08-01 01:21:57,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.29 | bwd_microstep: 4989.12 | bwd_inner_microstep: 4969.64 | bwd_allreduce_microstep: 19.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-08-01 01:22:05,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3203.31 | bwd_microstep: 4747.86 | bwd_inner_microstep: 4722.72 | bwd_allreduce_microstep: 25.07 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-08-01 01:22:13,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.01 | bwd_microstep: 4820.58 | bwd_inner_microstep: 4783.57 | bwd_allreduce_microstep: 36.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-08-01 01:22:21,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-08-01 01:22:21,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3249.37 | bwd_microstep: 4839.87 | bwd_inner_microstep: 4815.89 | bwd_allreduce_microstep: 23.91 | step_microstep: 182.68 [2024-08-01 01:22:21,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28407.62 | bwd: 40174.32 | bwd_inner: 39952.10 | bwd_allreduce: 221.73 | step: 183.38 96%|█████████▌| 1180/1230 [23:10:27<58:00, 69.62s/it] {'loss': 1.1318, 'learning_rate': 8.655683248802282e-08, 'epoch': 0.96} 96%|█████████▌| 1180/1230 [23:10:27<58:00, 69.62s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3923 [2024-08-01 01:22:31,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.22 | bwd_microstep: 5518.28 | bwd_inner_microstep: 5420.67 | bwd_allreduce_microstep: 97.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3992 [2024-08-01 01:22:39,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3327.10 | bwd_microstep: 5052.31 | bwd_inner_microstep: 5032.88 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-08-01 01:22:48,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.33 | bwd_microstep: 5230.10 | bwd_inner_microstep: 5145.35 | bwd_allreduce_microstep: 84.68 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-08-01 01:22:57,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.32 | bwd_microstep: 5318.02 | bwd_inner_microstep: 5243.67 | bwd_allreduce_microstep: 74.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-08-01 01:23:06,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.70 | bwd_microstep: 5054.12 | bwd_inner_microstep: 5027.10 | bwd_allreduce_microstep: 26.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-08-01 01:23:15,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.64 | bwd_microstep: 4950.94 | bwd_inner_microstep: 4918.23 | bwd_allreduce_microstep: 32.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-08-01 01:23:23,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.83 | bwd_microstep: 4895.11 | bwd_inner_microstep: 4875.73 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2138 [2024-08-01 01:23:32,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-08-01 01:23:32,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.98 | bwd_microstep: 5089.00 | bwd_inner_microstep: 4693.00 | bwd_allreduce_microstep: 395.93 | step_microstep: 183.16 [2024-08-01 01:23:32,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29017.04 | bwd: 41107.85 | bwd_inner: 40356.57 | bwd_allreduce: 750.79 | step: 183.76 96%|█████████▌| 1181/1230 [23:11:38<57:03, 69.87s/it] {'loss': 1.1051, 'learning_rate': 8.313393923500613e-08, 'epoch': 0.96} 96%|█████████▌| 1181/1230 [23:11:38<57:03, 69.87s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2351 [2024-08-01 01:23:41,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.72 | bwd_microstep: 5274.01 | bwd_inner_microstep: 4866.58 | bwd_allreduce_microstep: 407.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-08-01 01:23:50,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.88 | bwd_microstep: 5139.52 | bwd_inner_microstep: 5072.88 | bwd_allreduce_microstep: 66.57 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2245 [2024-08-01 01:23:59,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.64 | bwd_microstep: 5512.29 | bwd_inner_microstep: 5086.51 | bwd_allreduce_microstep: 425.71 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2219 [2024-08-01 01:24:07,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.03 | bwd_microstep: 5131.98 | bwd_inner_microstep: 4735.25 | bwd_allreduce_microstep: 396.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-08-01 01:24:16,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.59 | bwd_microstep: 5155.67 | bwd_inner_microstep: 5103.70 | bwd_allreduce_microstep: 51.89 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3717 [2024-08-01 01:24:25,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.48 | bwd_microstep: 5073.22 | bwd_inner_microstep: 5012.01 | bwd_allreduce_microstep: 61.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-08-01 01:24:34,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.48 | bwd_microstep: 5218.19 | bwd_inner_microstep: 4811.89 | bwd_allreduce_microstep: 406.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-08-01 01:24:43,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-08-01 01:24:43,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.03 | bwd_microstep: 5113.16 | bwd_inner_microstep: 4715.31 | bwd_allreduce_microstep: 397.78 | step_microstep: 181.42 [2024-08-01 01:24:43,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28625.75 | bwd: 41618.01 | bwd_inner: 39404.07 | bwd_allreduce: 2213.46 | step: 182.01 96%|█████████▌| 1182/1230 [23:12:48<56:03, 70.08s/it] {'loss': 1.1328, 'learning_rate': 7.977981503670795e-08, 'epoch': 0.96} 96%|█████████▌| 1182/1230 [23:12:48<56:03, 70.08s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3548 [2024-08-01 01:24:51,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3219.28 | bwd_microstep: 5074.60 | bwd_inner_microstep: 5002.89 | bwd_allreduce_microstep: 71.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2275 [2024-08-01 01:24:59,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3040.35 | bwd_microstep: 4978.49 | bwd_inner_microstep: 4595.56 | bwd_allreduce_microstep: 382.86 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2238 [2024-08-01 01:25:07,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3017.59 | bwd_microstep: 4938.33 | bwd_inner_microstep: 4557.69 | bwd_allreduce_microstep: 380.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-08-01 01:25:16,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.46 | bwd_microstep: 5028.48 | bwd_inner_microstep: 5001.26 | bwd_allreduce_microstep: 27.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-08-01 01:25:24,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.06 | bwd_microstep: 5187.71 | bwd_inner_microstep: 4785.91 | bwd_allreduce_microstep: 401.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-08-01 01:25:33,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.07 | bwd_microstep: 4969.93 | bwd_inner_microstep: 4950.60 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3658 [2024-08-01 01:25:42,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.65 | bwd_microstep: 5022.44 | bwd_inner_microstep: 4948.16 | bwd_allreduce_microstep: 74.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-08-01 01:25:51,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-08-01 01:25:51,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.94 | bwd_microstep: 5175.52 | bwd_inner_microstep: 4772.11 | bwd_allreduce_microstep: 403.35 | step_microstep: 181.78 [2024-08-01 01:25:51,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27347.30 | bwd: 40375.49 | bwd_inner: 38614.12 | bwd_allreduce: 1760.88 | step: 182.36 96%|█████████▌| 1183/1230 [23:13:56<54:25, 69.47s/it] {'loss': 1.1228, 'learning_rate': 7.64944831524872e-08, 'epoch': 0.96} 96%|█████████▌| 1183/1230 [23:13:56<54:25, 69.47s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3833 [2024-08-01 01:26:00,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3826.11 | bwd_microstep: 5283.29 | bwd_inner_microstep: 5236.40 | bwd_allreduce_microstep: 46.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3859 [2024-08-01 01:26:09,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.77 | bwd_microstep: 5110.56 | bwd_inner_microstep: 5088.34 | bwd_allreduce_microstep: 22.16 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3819 [2024-08-01 01:26:17,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.91 | bwd_microstep: 5066.48 | bwd_inner_microstep: 5017.49 | bwd_allreduce_microstep: 48.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-08-01 01:26:26,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.47 | bwd_microstep: 5120.94 | bwd_inner_microstep: 5049.00 | bwd_allreduce_microstep: 71.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-08-01 01:26:35,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.85 | bwd_microstep: 5248.85 | bwd_inner_microstep: 4841.33 | bwd_allreduce_microstep: 407.45 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3639 [2024-08-01 01:26:43,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.28 | bwd_microstep: 5059.18 | bwd_inner_microstep: 4980.90 | bwd_allreduce_microstep: 78.21 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-08-01 01:26:52,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.11 | bwd_microstep: 5069.93 | bwd_inner_microstep: 5010.48 | bwd_allreduce_microstep: 59.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-08-01 01:27:01,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-08-01 01:27:01,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.59 | bwd_microstep: 5323.46 | bwd_inner_microstep: 5204.74 | bwd_allreduce_microstep: 118.65 | step_microstep: 183.79 [2024-08-01 01:27:01,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29033.99 | bwd: 41282.68 | bwd_inner: 40428.61 | bwd_allreduce: 853.59 | step: 184.39 96%|█████████▋| 1184/1230 [23:15:07<53:31, 69.82s/it] {'loss': 1.1368, 'learning_rate': 7.327796636465767e-08, 'epoch': 0.96} 96%|█████████▋| 1184/1230 [23:15:07<53:31, 69.82s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3896 [2024-08-01 01:27:11,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.35 | bwd_microstep: 5552.06 | bwd_inner_microstep: 5463.42 | bwd_allreduce_microstep: 88.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-08-01 01:27:19,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.31 | bwd_microstep: 5136.50 | bwd_inner_microstep: 5087.12 | bwd_allreduce_microstep: 49.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-08-01 01:27:28,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.69 | bwd_microstep: 5464.09 | bwd_inner_microstep: 5043.75 | bwd_allreduce_microstep: 420.27 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2080 [2024-08-01 01:27:37,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.34 | bwd_microstep: 5296.32 | bwd_inner_microstep: 4885.28 | bwd_allreduce_microstep: 410.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-08-01 01:27:46,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.11 | bwd_microstep: 5159.10 | bwd_inner_microstep: 5084.90 | bwd_allreduce_microstep: 74.13 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3637 [2024-08-01 01:27:55,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.53 | bwd_microstep: 5171.55 | bwd_inner_microstep: 5076.69 | bwd_allreduce_microstep: 94.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-08-01 01:28:03,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.60 | bwd_microstep: 4984.53 | bwd_inner_microstep: 4935.02 | bwd_allreduce_microstep: 49.44 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2111 [2024-08-01 01:28:12,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-08-01 01:28:12,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.01 | bwd_microstep: 5118.69 | bwd_inner_microstep: 4722.80 | bwd_allreduce_microstep: 395.81 | step_microstep: 181.45 [2024-08-01 01:28:12,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28775.84 | bwd: 41882.80 | bwd_inner: 40298.92 | bwd_allreduce: 1583.38 | step: 182.16 96%|█████████▋| 1185/1230 [23:16:18<52:37, 70.17s/it] {'loss': 1.1083, 'learning_rate': 7.01302869783338e-08, 'epoch': 0.96} 96%|█████████▋| 1185/1230 [23:16:18<52:37, 70.17s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2773 [2024-08-01 01:28:22,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.30 | bwd_microstep: 5594.08 | bwd_inner_microstep: 5164.98 | bwd_allreduce_microstep: 429.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-08-01 01:28:30,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.66 | bwd_microstep: 5083.01 | bwd_inner_microstep: 5051.61 | bwd_allreduce_microstep: 31.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3904 [2024-08-01 01:28:39,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.15 | bwd_microstep: 5272.51 | bwd_inner_microstep: 5220.84 | bwd_allreduce_microstep: 51.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-08-01 01:28:48,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.60 | bwd_microstep: 5367.50 | bwd_inner_microstep: 4954.50 | bwd_allreduce_microstep: 412.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-08-01 01:28:57,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.69 | bwd_microstep: 5152.29 | bwd_inner_microstep: 5066.67 | bwd_allreduce_microstep: 85.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-08-01 01:29:06,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.70 | bwd_microstep: 5060.64 | bwd_inner_microstep: 5020.32 | bwd_allreduce_microstep: 40.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-08-01 01:29:14,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.59 | bwd_microstep: 5089.61 | bwd_inner_microstep: 4697.10 | bwd_allreduce_microstep: 392.44 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3648 [2024-08-01 01:29:23,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-08-01 01:29:23,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.32 | bwd_microstep: 5080.28 | bwd_inner_microstep: 5004.24 | bwd_allreduce_microstep: 75.97 | step_microstep: 181.47 [2024-08-01 01:29:23,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29053.93 | bwd: 41699.91 | bwd_inner: 40180.21 | bwd_allreduce: 1519.21 | step: 182.05 96%|█████████▋| 1186/1230 [23:17:29<51:39, 70.45s/it] {'loss': 1.1316, 'learning_rate': 6.705146682127184e-08, 'epoch': 0.96} 96%|█████████▋| 1186/1230 [23:17:29<51:39, 70.45s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2444 [2024-08-01 01:29:32,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.44 | bwd_microstep: 5423.48 | bwd_inner_microstep: 5007.91 | bwd_allreduce_microstep: 415.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3870 [2024-08-01 01:29:41,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3778.99 | bwd_microstep: 5117.80 | bwd_inner_microstep: 5098.42 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3757 [2024-08-01 01:29:50,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.53 | bwd_microstep: 5032.45 | bwd_inner_microstep: 5005.54 | bwd_allreduce_microstep: 26.84 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-08-01 01:29:58,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3450.23 | bwd_microstep: 5014.90 | bwd_inner_microstep: 4625.18 | bwd_allreduce_microstep: 389.66 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3684 [2024-08-01 01:30:06,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3073.76 | bwd_microstep: 4841.33 | bwd_inner_microstep: 4796.71 | bwd_allreduce_microstep: 44.55 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-08-01 01:30:15,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.87 | bwd_microstep: 5007.30 | bwd_inner_microstep: 4947.76 | bwd_allreduce_microstep: 59.48 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3672 [2024-08-01 01:30:24,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.40 | bwd_microstep: 4994.87 | bwd_inner_microstep: 4926.69 | bwd_allreduce_microstep: 68.11 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2120 [2024-08-01 01:30:32,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-08-01 01:30:32,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.94 | bwd_microstep: 5105.15 | bwd_inner_microstep: 4708.83 | bwd_allreduce_microstep: 396.25 | step_microstep: 181.81 [2024-08-01 01:30:32,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28219.05 | bwd: 40537.25 | bwd_inner: 39116.97 | bwd_allreduce: 1419.80 | step: 182.51 97%|█████████▋| 1187/1230 [23:18:38<50:11, 70.04s/it] {'loss': 1.1411, 'learning_rate': 6.404152724371892e-08, 'epoch': 0.96} 97%|█████████▋| 1187/1230 [23:18:38<50:11, 70.04s/it]dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 4011 [2024-08-01 01:30:42,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.64 | bwd_microstep: 5343.72 | bwd_inner_microstep: 5312.40 | bwd_allreduce_microstep: 31.26 | step_microstep: 0.09 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3737 [2024-08-01 01:30:51,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.68 | bwd_microstep: 5380.93 | bwd_inner_microstep: 5313.01 | bwd_allreduce_microstep: 67.85 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2846 [2024-08-01 01:30:59,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.72 | bwd_microstep: 5031.39 | bwd_inner_microstep: 4639.33 | bwd_allreduce_microstep: 391.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-08-01 01:31:08,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3793.36 | bwd_microstep: 5034.64 | bwd_inner_microstep: 5008.31 | bwd_allreduce_microstep: 26.27 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2107 [2024-08-01 01:31:16,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3003.75 | bwd_microstep: 4871.40 | bwd_inner_microstep: 4499.76 | bwd_allreduce_microstep: 371.57 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3660 [2024-08-01 01:31:24,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3111.73 | bwd_microstep: 4908.75 | bwd_inner_microstep: 4858.22 | bwd_allreduce_microstep: 50.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-08-01 01:31:32,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3436.16 | bwd_microstep: 5020.26 | bwd_inner_microstep: 4629.51 | bwd_allreduce_microstep: 390.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-08-01 01:31:41,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-08-01 01:31:41,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.74 | bwd_microstep: 5090.77 | bwd_inner_microstep: 5029.29 | bwd_allreduce_microstep: 61.41 | step_microstep: 181.34 [2024-08-01 01:31:41,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27867.70 | bwd: 40681.85 | bwd_inner: 39289.76 | bwd_allreduce: 1391.60 | step: 181.94 97%|█████████▋| 1188/1230 [23:19:47<48:47, 69.69s/it] {'loss': 1.121, 'learning_rate': 6.110048911826871e-08, 'epoch': 0.97} 97%|█████████▋| 1188/1230 [23:19:47<48:47, 69.69s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3945 [2024-08-01 01:31:51,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.21 | bwd_microstep: 5566.34 | bwd_inner_microstep: 5477.09 | bwd_allreduce_microstep: 89.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3812 [2024-08-01 01:32:00,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.76 | bwd_microstep: 5291.77 | bwd_inner_microstep: 5206.71 | bwd_allreduce_microstep: 85.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-08-01 01:32:08,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3376.57 | bwd_microstep: 5032.42 | bwd_inner_microstep: 4980.03 | bwd_allreduce_microstep: 52.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-08-01 01:32:17,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.62 | bwd_microstep: 5213.06 | bwd_inner_microstep: 4807.33 | bwd_allreduce_microstep: 405.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-08-01 01:32:25,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.54 | bwd_microstep: 5176.11 | bwd_inner_microstep: 4773.62 | bwd_allreduce_microstep: 402.42 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-08-01 01:32:34,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3053.34 | bwd_microstep: 5031.85 | bwd_inner_microstep: 4643.76 | bwd_allreduce_microstep: 388.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-08-01 01:32:42,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.04 | bwd_microstep: 5064.94 | bwd_inner_microstep: 5003.17 | bwd_allreduce_microstep: 61.71 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-08-01 01:32:51,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-08-01 01:32:51,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.61 | bwd_microstep: 5016.99 | bwd_inner_microstep: 4962.23 | bwd_allreduce_microstep: 54.69 | step_microstep: 181.61 [2024-08-01 01:32:51,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28046.60 | bwd: 41393.47 | bwd_inner: 39853.88 | bwd_allreduce: 1539.11 | step: 182.21 97%|█████████▋| 1189/1230 [23:20:57<47:38, 69.71s/it] {'loss': 1.0985, 'learning_rate': 5.82283728397115e-08, 'epoch': 0.97} 97%|█████████▋| 1189/1230 [23:20:57<47:38, 69.71s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3941 [2024-08-01 01:33:00,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3877.96 | bwd_microstep: 5478.55 | bwd_inner_microstep: 5417.14 | bwd_allreduce_microstep: 61.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3817 [2024-08-01 01:33:10,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3835.83 | bwd_microstep: 5339.51 | bwd_inner_microstep: 5287.81 | bwd_allreduce_microstep: 51.62 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-08-01 01:33:19,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.74 | bwd_microstep: 5097.79 | bwd_inner_microstep: 5070.69 | bwd_allreduce_microstep: 27.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-08-01 01:33:27,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.34 | bwd_microstep: 5055.30 | bwd_inner_microstep: 5027.55 | bwd_allreduce_microstep: 27.67 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2078 [2024-08-01 01:33:36,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.21 | bwd_microstep: 5218.28 | bwd_inner_microstep: 4812.15 | bwd_allreduce_microstep: 406.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-08-01 01:33:45,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.37 | bwd_microstep: 4877.62 | bwd_inner_microstep: 4858.26 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-08-01 01:33:53,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3200.93 | bwd_microstep: 4713.90 | bwd_inner_microstep: 4689.34 | bwd_allreduce_microstep: 24.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-08-01 01:34:02,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-08-01 01:34:02,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.04 | bwd_microstep: 5065.47 | bwd_inner_microstep: 5002.85 | bwd_allreduce_microstep: 62.55 | step_microstep: 411.97 [2024-08-01 01:34:02,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29286.33 | bwd: 40846.39 | bwd_inner: 40165.74 | bwd_allreduce: 680.16 | step: 412.56 97%|█████████▋| 1190/1230 [23:22:08<46:40, 70.01s/it] {'loss': 1.1362, 'learning_rate': 5.542519832489546e-08, 'epoch': 0.97} 97%|█████████▋| 1190/1230 [23:22:08<46:40, 70.01s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 4045 [2024-08-01 01:34:11,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.21 | bwd_microstep: 5066.59 | bwd_inner_microstep: 5041.90 | bwd_allreduce_microstep: 24.62 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-08-01 01:34:19,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.48 | bwd_microstep: 5164.31 | bwd_inner_microstep: 5109.14 | bwd_allreduce_microstep: 55.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-08-01 01:34:28,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.19 | bwd_microstep: 5164.80 | bwd_inner_microstep: 5087.70 | bwd_allreduce_microstep: 77.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-08-01 01:34:36,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3040.92 | bwd_microstep: 4967.35 | bwd_inner_microstep: 4585.46 | bwd_allreduce_microstep: 381.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-08-01 01:34:44,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3199.37 | bwd_microstep: 4764.93 | bwd_inner_microstep: 4738.63 | bwd_allreduce_microstep: 26.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-08-01 01:34:53,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.02 | bwd_microstep: 5115.48 | bwd_inner_microstep: 5069.56 | bwd_allreduce_microstep: 45.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-08-01 01:35:02,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.82 | bwd_microstep: 4936.20 | bwd_inner_microstep: 4911.57 | bwd_allreduce_microstep: 24.57 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-08-01 01:35:10,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-08-01 01:35:10,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.54 | bwd_microstep: 5159.27 | bwd_inner_microstep: 4757.81 | bwd_allreduce_microstep: 401.38 | step_microstep: 182.28 [2024-08-01 01:35:10,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27999.45 | bwd: 40338.91 | bwd_inner: 39301.70 | bwd_allreduce: 1036.74 | step: 182.87 97%|█████████▋| 1191/1230 [23:23:16<45:14, 69.61s/it] {'loss': 1.1123, 'learning_rate': 5.269098501259007e-08, 'epoch': 0.97} 97%|█████████▋| 1191/1230 [23:23:16<45:14, 69.61s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-08-01 01:35:20,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.29 | bwd_microstep: 5567.54 | bwd_inner_microstep: 5384.69 | bwd_allreduce_microstep: 182.78 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2268 [2024-08-01 01:35:29,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.31 | bwd_microstep: 5289.03 | bwd_inner_microstep: 4879.54 | bwd_allreduce_microstep: 409.42 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3607 [2024-08-01 01:35:37,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.22 | bwd_microstep: 5107.21 | bwd_inner_microstep: 5020.39 | bwd_allreduce_microstep: 86.76 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-08-01 01:35:46,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.46 | bwd_microstep: 5177.52 | bwd_inner_microstep: 5091.36 | bwd_allreduce_microstep: 86.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-08-01 01:35:55,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.50 | bwd_microstep: 5025.58 | bwd_inner_microstep: 4966.12 | bwd_allreduce_microstep: 59.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-08-01 01:36:03,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.22 | bwd_microstep: 4967.18 | bwd_inner_microstep: 4919.85 | bwd_allreduce_microstep: 47.27 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3665 [2024-08-01 01:36:12,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.18 | bwd_microstep: 5041.08 | bwd_inner_microstep: 4968.33 | bwd_allreduce_microstep: 72.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-08-01 01:36:21,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.82 [2024-08-01 01:36:21,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.22 | bwd_microstep: 5047.02 | bwd_inner_microstep: 5006.08 | bwd_allreduce_microstep: 40.88 | step_microstep: 181.56 [2024-08-01 01:36:21,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28793.31 | bwd: 41222.16 | bwd_inner: 40236.29 | bwd_allreduce: 985.37 | step: 182.26 97%|█████████▋| 1192/1230 [23:24:27<44:13, 69.83s/it] {'loss': 1.1148, 'learning_rate': 5.002575186334735e-08, 'epoch': 0.97} 97%|█████████▋| 1192/1230 [23:24:27<44:13, 69.83s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-08-01 01:36:30,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.53 | bwd_microstep: 5484.73 | bwd_inner_microstep: 5346.63 | bwd_allreduce_microstep: 138.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-08-01 01:36:39,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3806.29 | bwd_microstep: 5153.50 | bwd_inner_microstep: 5134.21 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2240 [2024-08-01 01:36:48,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.23 | bwd_microstep: 5219.32 | bwd_inner_microstep: 4813.00 | bwd_allreduce_microstep: 406.26 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3601 [2024-08-01 01:36:57,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.70 | bwd_microstep: 5184.79 | bwd_inner_microstep: 5077.99 | bwd_allreduce_microstep: 106.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-08-01 01:37:05,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.52 | bwd_microstep: 4797.20 | bwd_inner_microstep: 4761.03 | bwd_allreduce_microstep: 36.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-08-01 01:37:13,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.74 | bwd_microstep: 5172.34 | bwd_inner_microstep: 4770.57 | bwd_allreduce_microstep: 401.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-08-01 01:37:22,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.01 | bwd_microstep: 4970.79 | bwd_inner_microstep: 4949.27 | bwd_allreduce_microstep: 21.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-08-01 01:37:30,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-08-01 01:37:30,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3194.25 | bwd_microstep: 4724.93 | bwd_inner_microstep: 4700.06 | bwd_allreduce_microstep: 24.80 | step_microstep: 181.48 [2024-08-01 01:37:30,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28320.18 | bwd: 40707.60 | bwd_inner: 39552.69 | bwd_allreduce: 1154.42 | step: 182.06 97%|█████████▋| 1193/1230 [23:25:36<42:58, 69.69s/it] {'loss': 1.153, 'learning_rate': 4.742951735937418e-08, 'epoch': 0.97} 97%|█████████▋| 1193/1230 [23:25:36<42:58, 69.69s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2414 [2024-08-01 01:37:39,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.48 | bwd_microstep: 5641.13 | bwd_inner_microstep: 5206.45 | bwd_allreduce_microstep: 434.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3824 [2024-08-01 01:37:48,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.06 | bwd_microstep: 5185.66 | bwd_inner_microstep: 5131.48 | bwd_allreduce_microstep: 54.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-08-01 01:37:57,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.02 | bwd_microstep: 5116.34 | bwd_inner_microstep: 5065.44 | bwd_allreduce_microstep: 50.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3770 [2024-08-01 01:38:06,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.00 | bwd_microstep: 5004.28 | bwd_inner_microstep: 4982.45 | bwd_allreduce_microstep: 21.76 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3766 [2024-08-01 01:38:15,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.02 | bwd_microstep: 5037.17 | bwd_inner_microstep: 5014.18 | bwd_allreduce_microstep: 22.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-08-01 01:38:23,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.26 | bwd_microstep: 5089.93 | bwd_inner_microstep: 4693.27 | bwd_allreduce_microstep: 396.60 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-08-01 01:38:32,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.61 | bwd_microstep: 5052.78 | bwd_inner_microstep: 4994.77 | bwd_allreduce_microstep: 57.94 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3665 [2024-08-01 01:38:40,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-08-01 01:38:40,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3102.58 | bwd_microstep: 4936.88 | bwd_inner_microstep: 4881.31 | bwd_allreduce_microstep: 55.50 | step_microstep: 181.76 [2024-08-01 01:38:40,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28566.94 | bwd: 41064.15 | bwd_inner: 39969.29 | bwd_allreduce: 1094.38 | step: 182.36 97%|█████████▋| 1194/1230 [23:26:46<41:51, 69.77s/it] {'loss': 1.138, 'learning_rate': 4.490229950440239e-08, 'epoch': 0.97} 97%|█████████▋| 1194/1230 [23:26:46<41:51, 69.77s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2337 [2024-08-01 01:38:49,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3169.94 | bwd_microstep: 5423.61 | bwd_inner_microstep: 5012.65 | bwd_allreduce_microstep: 410.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3794 [2024-08-01 01:38:58,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.96 | bwd_microstep: 5123.53 | bwd_inner_microstep: 5087.07 | bwd_allreduce_microstep: 36.38 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3598 [2024-08-01 01:39:06,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.95 | bwd_microstep: 5199.36 | bwd_inner_microstep: 5115.31 | bwd_allreduce_microstep: 83.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-08-01 01:39:15,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.15 | bwd_microstep: 5213.86 | bwd_inner_microstep: 4809.54 | bwd_allreduce_microstep: 404.25 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1132 [2024-08-01 01:39:24,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.37 | bwd_microstep: 5206.79 | bwd_inner_microstep: 4806.80 | bwd_allreduce_microstep: 399.92 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-08-01 01:39:33,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.25 | bwd_microstep: 4920.23 | bwd_inner_microstep: 4895.45 | bwd_allreduce_microstep: 24.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-08-01 01:39:41,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.92 | bwd_microstep: 5056.01 | bwd_inner_microstep: 4993.43 | bwd_allreduce_microstep: 62.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-08-01 01:39:50,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-08-01 01:39:50,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.63 | bwd_microstep: 5170.14 | bwd_inner_microstep: 5113.46 | bwd_allreduce_microstep: 56.61 | step_microstep: 181.61 [2024-08-01 01:39:50,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28505.07 | bwd: 41313.52 | bwd_inner: 39833.67 | bwd_allreduce: 1479.37 | step: 182.32 97%|█████████▋| 1195/1230 [23:27:56<40:45, 69.88s/it] {'loss': 1.1994, 'learning_rate': 4.2444115823562226e-08, 'epoch': 0.97} 97%|█████████▋| 1195/1230 [23:27:56<40:45, 69.88s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2032 [2024-08-01 01:39:59,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.13 | bwd_microstep: 5317.96 | bwd_inner_microstep: 4905.47 | bwd_allreduce_microstep: 412.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-08-01 01:40:08,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.79 | bwd_microstep: 5153.65 | bwd_inner_microstep: 5113.66 | bwd_allreduce_microstep: 39.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2268 [2024-08-01 01:40:17,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.71 | bwd_microstep: 5168.40 | bwd_inner_microstep: 4765.14 | bwd_allreduce_microstep: 403.19 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2072 [2024-08-01 01:40:26,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.73 | bwd_microstep: 5255.53 | bwd_inner_microstep: 4849.01 | bwd_allreduce_microstep: 406.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-08-01 01:40:34,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.28 | bwd_microstep: 5051.08 | bwd_inner_microstep: 4660.75 | bwd_allreduce_microstep: 390.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-08-01 01:40:43,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.24 | bwd_microstep: 4995.48 | bwd_inner_microstep: 4937.26 | bwd_allreduce_microstep: 58.15 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2098 [2024-08-01 01:40:51,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.06 | bwd_microstep: 5094.30 | bwd_inner_microstep: 4698.60 | bwd_allreduce_microstep: 395.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-08-01 01:41:00,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.77 [2024-08-01 01:41:00,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.14 | bwd_microstep: 4997.49 | bwd_inner_microstep: 4978.20 | bwd_allreduce_microstep: 19.22 | step_microstep: 181.56 [2024-08-01 01:41:00,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28661.95 | bwd: 41033.88 | bwd_inner: 38908.05 | bwd_allreduce: 2125.34 | step: 182.15 97%|█████████▋| 1196/1230 [23:29:06<39:37, 69.93s/it] {'loss': 1.0905, 'learning_rate': 4.005498336326463e-08, 'epoch': 0.97} 97%|█████████▋| 1196/1230 [23:29:06<39:37, 69.93s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3860 [2024-08-01 01:41:09,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3810.22 | bwd_microstep: 5132.75 | bwd_inner_microstep: 5113.58 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2898 [2024-08-01 01:41:18,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.38 | bwd_microstep: 5177.53 | bwd_inner_microstep: 4773.03 | bwd_allreduce_microstep: 404.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-08-01 01:41:27,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.01 | bwd_microstep: 5029.78 | bwd_inner_microstep: 5010.36 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-08-01 01:41:35,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.96 | bwd_microstep: 5172.91 | bwd_inner_microstep: 4771.28 | bwd_allreduce_microstep: 401.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-08-01 01:41:44,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.59 | bwd_microstep: 5122.92 | bwd_inner_microstep: 5054.50 | bwd_allreduce_microstep: 68.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-08-01 01:41:53,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.07 | bwd_microstep: 5226.10 | bwd_inner_microstep: 4821.37 | bwd_allreduce_microstep: 404.66 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3697 [2024-08-01 01:42:02,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.72 | bwd_microstep: 5075.68 | bwd_inner_microstep: 4998.09 | bwd_allreduce_microstep: 77.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3672 [2024-08-01 01:42:10,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-08-01 01:42:10,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.83 | bwd_microstep: 4914.91 | bwd_inner_microstep: 4889.27 | bwd_allreduce_microstep: 25.57 | step_microstep: 182.94 [2024-08-01 01:42:10,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29055.67 | bwd: 40852.55 | bwd_inner: 39431.42 | bwd_allreduce: 1420.65 | step: 183.52 97%|█████████▋| 1197/1230 [23:30:16<38:30, 70.02s/it] {'loss': 1.0826, 'learning_rate': 3.773491869108137e-08, 'epoch': 0.97} 97%|█████████▋| 1197/1230 [23:30:16<38:30, 70.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4096 [2024-08-01 01:42:20,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.04 | bwd_microstep: 5343.94 | bwd_inner_microstep: 5312.43 | bwd_allreduce_microstep: 31.44 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 2840 [2024-08-01 01:42:29,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.49 | bwd_microstep: 5334.58 | bwd_inner_microstep: 4920.30 | bwd_allreduce_microstep: 414.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-08-01 01:42:38,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.83 | bwd_microstep: 5300.86 | bwd_inner_microstep: 5232.17 | bwd_allreduce_microstep: 68.63 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2217 [2024-08-01 01:42:46,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.91 | bwd_microstep: 5255.75 | bwd_inner_microstep: 4848.71 | bwd_allreduce_microstep: 406.97 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-08-01 01:42:55,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.63 | bwd_microstep: 5006.68 | bwd_inner_microstep: 4987.29 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-08-01 01:43:04,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.84 | bwd_microstep: 4880.39 | bwd_inner_microstep: 4861.04 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-08-01 01:43:12,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.81 | bwd_microstep: 4992.52 | bwd_inner_microstep: 4938.36 | bwd_allreduce_microstep: 54.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3666 [2024-08-01 01:43:21,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-08-01 01:43:21,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3671.56 | bwd_microstep: 4893.74 | bwd_inner_microstep: 4874.30 | bwd_allreduce_microstep: 19.35 | step_microstep: 183.58 [2024-08-01 01:43:21,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29198.00 | bwd: 41008.42 | bwd_inner: 39974.54 | bwd_allreduce: 1033.37 | step: 184.29 97%|█████████▋| 1198/1230 [23:31:27<37:25, 70.18s/it] {'loss': 1.0926, 'learning_rate': 3.548393789562732e-08, 'epoch': 0.97} 97%|█████████▋| 1198/1230 [23:31:27<37:25, 70.18s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3566 [2024-08-01 01:43:29,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3186.67 | bwd_microstep: 4823.26 | bwd_inner_microstep: 4778.17 | bwd_allreduce_microstep: 45.01 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-08-01 01:43:38,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.01 | bwd_microstep: 5036.09 | bwd_inner_microstep: 5011.03 | bwd_allreduce_microstep: 24.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-08-01 01:43:47,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.02 | bwd_microstep: 5096.47 | bwd_inner_microstep: 5030.91 | bwd_allreduce_microstep: 65.49 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2117 [2024-08-01 01:43:55,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3051.26 | bwd_microstep: 5015.35 | bwd_inner_microstep: 4629.29 | bwd_allreduce_microstep: 385.99 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2874 [2024-08-01 01:44:03,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.11 | bwd_microstep: 5200.90 | bwd_inner_microstep: 4795.80 | bwd_allreduce_microstep: 405.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3674 [2024-08-01 01:44:12,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.56 | bwd_microstep: 4865.62 | bwd_inner_microstep: 4846.21 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-08-01 01:44:21,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.08 | bwd_microstep: 5146.52 | bwd_inner_microstep: 5069.37 | bwd_allreduce_microstep: 77.08 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3693 [2024-08-01 01:44:30,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-08-01 01:44:30,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.27 | bwd_microstep: 5081.84 | bwd_inner_microstep: 5016.20 | bwd_allreduce_microstep: 65.58 | step_microstep: 181.66 [2024-08-01 01:44:30,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28005.90 | bwd: 40266.02 | bwd_inner: 39176.90 | bwd_allreduce: 1088.62 | step: 182.25 97%|█████████▋| 1199/1230 [23:32:36<36:00, 69.71s/it] {'loss': 1.2021, 'learning_rate': 3.3302056586453916e-08, 'epoch': 0.97} 97%|█████████▋| 1199/1230 [23:32:36<36:00, 69.71s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3953 [2024-08-01 01:44:39,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.30 | bwd_microstep: 5171.27 | bwd_inner_microstep: 5152.20 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3779 [2024-08-01 01:44:47,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3362.08 | bwd_microstep: 5085.11 | bwd_inner_microstep: 5033.37 | bwd_allreduce_microstep: 51.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-08-01 01:44:56,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.93 | bwd_microstep: 5150.58 | bwd_inner_microstep: 5071.34 | bwd_allreduce_microstep: 79.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-08-01 01:45:04,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3021.72 | bwd_microstep: 4876.31 | bwd_inner_microstep: 4500.22 | bwd_allreduce_microstep: 376.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-08-01 01:45:12,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.25 | bwd_microstep: 4976.63 | bwd_inner_microstep: 4957.25 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-08-01 01:45:21,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.05 | bwd_microstep: 5162.70 | bwd_inner_microstep: 5109.37 | bwd_allreduce_microstep: 53.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-08-01 01:45:30,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.93 | bwd_microstep: 5150.70 | bwd_inner_microstep: 5078.21 | bwd_allreduce_microstep: 72.42 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2104 [2024-08-01 01:45:39,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-08-01 01:45:39,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.80 | bwd_microstep: 5096.25 | bwd_inner_microstep: 4700.06 | bwd_allreduce_microstep: 396.12 | step_microstep: 181.76 [2024-08-01 01:45:39,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28224.96 | bwd: 40669.52 | bwd_inner: 39601.96 | bwd_allreduce: 1067.08 | step: 182.34 98%|█████████▊| 1200/1230 [23:33:45<34:46, 69.56s/it] {'loss': 1.134, 'learning_rate': 3.118928989393699e-08, 'epoch': 0.98} 98%|█████████▊| 1200/1230 [23:33:45<34:46, 69.56s/it][INFO|trainer.py:2936] 2024-08-01 01:46:06,043 >> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200 [INFO|configuration_utils.py:473] 2024-08-01 01:46:06,044 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/config.json [INFO|configuration_utils.py:594] 2024-08-01 01:46:06,045 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/generation_config.json [INFO|modeling_utils.py:2501] 2024-08-01 01:46:59,579 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-08-01 01:46:59,581 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-08-01 01:46:59,581 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-08-01 01:46:59,581 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/added_tokens.json [2024-08-01 01:46:59,620] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1200 is about to be saved! [2024-08-01 01:47:00,284] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/global_step1200/zero_pp_rank_0_mp_rank_00_model_states.pt [2024-08-01 01:47:00,285] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/global_step1200/zero_pp_rank_0_mp_rank_00_model_states.pt... [2024-08-01 01:47:02,108] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/global_step1200/zero_pp_rank_0_mp_rank_00_model_states.pt. [2024-08-01 01:47:02,218] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/global_step1200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-08-01 01:48:03,508] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/global_step1200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-08-01 01:48:03,509] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tmp-checkpoint-1200/global_step1200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-08-01 01:48:03,533] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1200 is ready now! [INFO|trainer.py:3028] 2024-08-01 01:48:03,569 >> Deleting older checkpoint [/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/checkpoint-1000] due to args.save_total_limit dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2470 [2024-08-01 01:48:44,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.10 | bwd_microstep: 5360.27 | bwd_inner_microstep: 4950.27 | bwd_allreduce_microstep: 409.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-08-01 01:48:53,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.55 | bwd_microstep: 5266.11 | bwd_inner_microstep: 4859.23 | bwd_allreduce_microstep: 406.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-08-01 01:49:01,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.08 | bwd_microstep: 5149.30 | bwd_inner_microstep: 5097.06 | bwd_allreduce_microstep: 52.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-08-01 01:49:10,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.20 | bwd_microstep: 4956.80 | bwd_inner_microstep: 4924.76 | bwd_allreduce_microstep: 31.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-08-01 01:49:19,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.62 | bwd_microstep: 5134.16 | bwd_inner_microstep: 4735.38 | bwd_allreduce_microstep: 398.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-08-01 01:49:27,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.45 | bwd_microstep: 4980.89 | bwd_inner_microstep: 4944.30 | bwd_allreduce_microstep: 36.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-08-01 01:49:35,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3180.33 | bwd_microstep: 4716.48 | bwd_inner_microstep: 4696.95 | bwd_allreduce_microstep: 19.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-08-01 01:49:43,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-08-01 01:49:43,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3319.05 | bwd_microstep: 4789.87 | bwd_inner_microstep: 4762.02 | bwd_allreduce_microstep: 27.77 | step_microstep: 181.20 [2024-08-01 01:49:43,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28032.28 | bwd: 40353.84 | bwd_inner: 38969.93 | bwd_allreduce: 1383.43 | step: 181.76 98%|█████████▊| 1201/1230 [23:37:49<59:00, 122.08s/it] {'loss': 1.1044, 'learning_rate': 2.9145652469174666e-08, 'epoch': 0.98} 98%|█████████▊| 1201/1230 [23:37:49<59:00, 122.08s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2042 [2024-08-01 01:49:52,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.12 | bwd_microstep: 5331.36 | bwd_inner_microstep: 4919.40 | bwd_allreduce_microstep: 411.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-08-01 01:50:01,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.71 | bwd_microstep: 5257.44 | bwd_inner_microstep: 4848.09 | bwd_allreduce_microstep: 409.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2259 [2024-08-01 01:50:10,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.71 | bwd_microstep: 5251.47 | bwd_inner_microstep: 4844.28 | bwd_allreduce_microstep: 407.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3847 [2024-08-01 01:50:19,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.84 | bwd_microstep: 5095.51 | bwd_inner_microstep: 5076.12 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-08-01 01:50:27,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.25 | bwd_microstep: 4927.66 | bwd_inner_microstep: 4886.63 | bwd_allreduce_microstep: 40.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-08-01 01:50:36,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.33 | bwd_microstep: 4918.15 | bwd_inner_microstep: 4893.12 | bwd_allreduce_microstep: 24.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-08-01 01:50:45,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.91 | bwd_microstep: 4888.22 | bwd_inner_microstep: 4868.87 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-08-01 01:50:54,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-08-01 01:50:54,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.94 | bwd_microstep: 5053.09 | bwd_inner_microstep: 4995.41 | bwd_allreduce_microstep: 57.62 | step_microstep: 180.91 [2024-08-01 01:50:54,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28962.73 | bwd: 40722.87 | bwd_inner: 39331.84 | bwd_allreduce: 1390.52 | step: 181.48 98%|█████████▊| 1202/1230 [23:38:59<49:41, 106.46s/it] {'loss': 1.1576, 'learning_rate': 2.7171158483881855e-08, 'epoch': 0.98} 98%|█████████▊| 1202/1230 [23:38:59<49:41, 106.46s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3901 [2024-08-01 01:51:02,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.34 | bwd_microstep: 5139.93 | bwd_inner_microstep: 5120.74 | bwd_allreduce_microstep: 19.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3878 [2024-08-01 01:51:11,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.92 | bwd_microstep: 4956.75 | bwd_inner_microstep: 4937.13 | bwd_allreduce_microstep: 19.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-08-01 01:51:20,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.16 | bwd_microstep: 5034.47 | bwd_inner_microstep: 5007.12 | bwd_allreduce_microstep: 27.28 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2212 [2024-08-01 01:51:29,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.52 | bwd_microstep: 5235.00 | bwd_inner_microstep: 4828.79 | bwd_allreduce_microstep: 406.14 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 2840 [2024-08-01 01:51:37,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.62 | bwd_microstep: 5165.15 | bwd_inner_microstep: 4761.41 | bwd_allreduce_microstep: 403.67 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2139 [2024-08-01 01:51:45,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3017.03 | bwd_microstep: 4906.63 | bwd_inner_microstep: 4529.50 | bwd_allreduce_microstep: 377.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-08-01 01:51:53,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.63 | bwd_microstep: 4802.56 | bwd_inner_microstep: 4765.23 | bwd_allreduce_microstep: 37.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-08-01 01:52:02,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.73 [2024-08-01 01:52:02,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.14 | bwd_microstep: 5018.81 | bwd_inner_microstep: 4982.12 | bwd_allreduce_microstep: 36.62 | step_microstep: 181.17 [2024-08-01 01:52:02,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28271.27 | bwd: 40259.26 | bwd_inner: 38931.98 | bwd_allreduce: 1326.80 | step: 181.74 98%|█████████▊| 1203/1230 [23:40:08<42:49, 95.18s/it] {'loss': 1.1418, 'learning_rate': 2.5265821630298116e-08, 'epoch': 0.98} 98%|█████████▊| 1203/1230 [23:40:08<42:49, 95.18s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-08-01 01:52:12,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.51 | bwd_microstep: 5607.27 | bwd_inner_microstep: 5496.98 | bwd_allreduce_microstep: 110.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3797 [2024-08-01 01:52:20,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.73 | bwd_microstep: 5099.18 | bwd_inner_microstep: 5055.75 | bwd_allreduce_microstep: 43.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3763 [2024-08-01 01:52:29,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.92 | bwd_microstep: 5238.54 | bwd_inner_microstep: 5175.46 | bwd_allreduce_microstep: 63.02 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2214 [2024-08-01 01:52:38,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.50 | bwd_microstep: 5242.16 | bwd_inner_microstep: 4835.08 | bwd_allreduce_microstep: 407.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-08-01 01:52:47,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.85 | bwd_microstep: 5033.09 | bwd_inner_microstep: 5006.34 | bwd_allreduce_microstep: 26.68 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2171 [2024-08-01 01:52:56,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.40 | bwd_microstep: 5091.75 | bwd_inner_microstep: 4699.23 | bwd_allreduce_microstep: 392.46 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3638 [2024-08-01 01:53:03,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3047.20 | bwd_microstep: 4820.59 | bwd_inner_microstep: 4778.22 | bwd_allreduce_microstep: 42.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3682 [2024-08-01 01:53:12,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.74 [2024-08-01 01:53:12,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.68 | bwd_microstep: 4869.01 | bwd_inner_microstep: 4849.64 | bwd_allreduce_microstep: 19.30 | step_microstep: 182.00 [2024-08-01 01:53:12,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28469.70 | bwd: 41001.57 | bwd_inner: 39896.64 | bwd_allreduce: 1104.45 | step: 182.58 98%|█████████▊| 1204/1230 [23:41:18<37:56, 87.57s/it] {'loss': 1.1132, 'learning_rate': 2.3429655121085525e-08, 'epoch': 0.98} 98%|█████████▊| 1204/1230 [23:41:18<37:56, 87.57s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4088 [2024-08-01 01:53:21,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3870.79 | bwd_microstep: 5349.24 | bwd_inner_microstep: 5330.11 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4052 [2024-08-01 01:53:30,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.46 | bwd_microstep: 5135.52 | bwd_inner_microstep: 5116.19 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-08-01 01:53:39,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.29 | bwd_microstep: 5451.66 | bwd_inner_microstep: 5032.44 | bwd_allreduce_microstep: 419.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-08-01 01:53:48,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.67 | bwd_microstep: 5262.86 | bwd_inner_microstep: 4853.45 | bwd_allreduce_microstep: 409.34 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2094 [2024-08-01 01:53:57,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.97 | bwd_microstep: 5252.66 | bwd_inner_microstep: 4847.84 | bwd_allreduce_microstep: 404.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-08-01 01:54:06,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.05 | bwd_microstep: 5030.39 | bwd_inner_microstep: 5004.99 | bwd_allreduce_microstep: 25.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-08-01 01:54:15,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.44 | bwd_microstep: 5206.65 | bwd_inner_microstep: 5122.20 | bwd_allreduce_microstep: 84.39 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2143 [2024-08-01 01:54:23,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-08-01 01:54:23,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.90 | bwd_microstep: 5111.61 | bwd_inner_microstep: 4715.18 | bwd_allreduce_microstep: 396.36 | step_microstep: 182.19 [2024-08-01 01:54:23,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29180.47 | bwd: 41800.58 | bwd_inner: 40022.35 | bwd_allreduce: 1777.74 | step: 182.76 98%|█████████▊| 1205/1230 [23:42:29<34:27, 82.69s/it] {'loss': 1.0855, 'learning_rate': 2.1662671689242076e-08, 'epoch': 0.98} 98%|█████████▊| 1205/1230 [23:42:29<34:27, 82.69s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3994 [2024-08-01 01:54:32,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.07 | bwd_microstep: 5133.20 | bwd_inner_microstep: 5112.03 | bwd_allreduce_microstep: 21.10 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3573 [2024-08-01 01:54:41,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.55 | bwd_microstep: 5232.26 | bwd_inner_microstep: 5097.24 | bwd_allreduce_microstep: 134.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-08-01 01:54:50,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.83 | bwd_microstep: 4981.96 | bwd_inner_microstep: 4962.55 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3639 [2024-08-01 01:54:59,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.12 | bwd_microstep: 4907.63 | bwd_inner_microstep: 4876.72 | bwd_allreduce_microstep: 30.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-08-01 01:55:07,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3338.16 | bwd_microstep: 4952.83 | bwd_inner_microstep: 4907.57 | bwd_allreduce_microstep: 45.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-08-01 01:55:15,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.32 | bwd_microstep: 5026.37 | bwd_inner_microstep: 4962.81 | bwd_allreduce_microstep: 63.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3693 [2024-08-01 01:55:24,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.93 | bwd_microstep: 4932.13 | bwd_inner_microstep: 4887.90 | bwd_allreduce_microstep: 44.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2110 [2024-08-01 01:55:33,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-08-01 01:55:33,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3469.20 | bwd_microstep: 5061.49 | bwd_inner_microstep: 4666.45 | bwd_allreduce_microstep: 394.97 | step_microstep: 182.48 [2024-08-01 01:55:33,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28654.08 | bwd: 40227.85 | bwd_inner: 39473.20 | bwd_allreduce: 754.16 | step: 183.06 98%|█████████▊| 1206/1230 [23:43:39<31:27, 78.65s/it] {'loss': 1.1696, 'learning_rate': 1.9964883588010632e-08, 'epoch': 0.98} 98%|█████████▊| 1206/1230 [23:43:39<31:27, 78.65s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2389 [2024-08-01 01:55:42,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.27 | bwd_microstep: 5368.15 | bwd_inner_microstep: 4957.00 | bwd_allreduce_microstep: 411.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3781 [2024-08-01 01:55:50,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.70 | bwd_microstep: 5171.03 | bwd_inner_microstep: 5122.73 | bwd_allreduce_microstep: 48.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-08-01 01:55:59,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.96 | bwd_microstep: 5326.15 | bwd_inner_microstep: 5247.23 | bwd_allreduce_microstep: 78.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-08-01 01:56:08,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.83 | bwd_microstep: 5178.69 | bwd_inner_microstep: 4778.07 | bwd_allreduce_microstep: 400.55 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2262 [2024-08-01 01:56:17,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.34 | bwd_microstep: 5205.76 | bwd_inner_microstep: 4800.11 | bwd_allreduce_microstep: 405.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-08-01 01:56:26,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.75 | bwd_microstep: 4986.53 | bwd_inner_microstep: 4967.10 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-08-01 01:56:34,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.62 | bwd_microstep: 5077.20 | bwd_inner_microstep: 5016.76 | bwd_allreduce_microstep: 60.37 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3699 [2024-08-01 01:56:43,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.74 [2024-08-01 01:56:43,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.06 | bwd_microstep: 5137.37 | bwd_inner_microstep: 5075.37 | bwd_allreduce_microstep: 61.93 | step_microstep: 181.36 [2024-08-01 01:56:43,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28827.42 | bwd: 41450.86 | bwd_inner: 39964.31 | bwd_allreduce: 1486.06 | step: 181.94 98%|█████████▊| 1207/1230 [23:44:49<29:13, 76.24s/it] {'loss': 1.1172, 'learning_rate': 1.8336302590798992e-08, 'epoch': 0.98} 98%|█████████▊| 1207/1230 [23:44:49<29:13, 76.24s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-08-01 01:56:52,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.29 | bwd_microstep: 5317.40 | bwd_inner_microstep: 5249.72 | bwd_allreduce_microstep: 67.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-08-01 01:57:01,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.53 | bwd_microstep: 5038.57 | bwd_inner_microstep: 5019.13 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3832 [2024-08-01 01:57:10,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3785.25 | bwd_microstep: 5059.35 | bwd_inner_microstep: 5039.39 | bwd_allreduce_microstep: 19.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3803 [2024-08-01 01:57:19,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.05 | bwd_microstep: 5045.33 | bwd_inner_microstep: 5025.72 | bwd_allreduce_microstep: 19.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3780 [2024-08-01 01:57:28,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.51 | bwd_microstep: 5028.40 | bwd_inner_microstep: 5009.06 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3718 [2024-08-01 01:57:36,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.74 | bwd_microstep: 5162.62 | bwd_inner_microstep: 5088.78 | bwd_allreduce_microstep: 73.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-08-01 01:57:45,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.14 | bwd_microstep: 5078.50 | bwd_inner_microstep: 5012.13 | bwd_allreduce_microstep: 66.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-08-01 01:57:54,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-08-01 01:57:54,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.70 | bwd_microstep: 4876.06 | bwd_inner_microstep: 4849.49 | bwd_allreduce_microstep: 26.50 | step_microstep: 182.05 [2024-08-01 01:57:54,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29481.12 | bwd: 40606.21 | bwd_inner: 40293.37 | bwd_allreduce: 312.35 | step: 182.63 98%|█████████▊| 1208/1230 [23:46:00<27:18, 74.49s/it] {'loss': 1.0926, 'learning_rate': 1.677693999109109e-08, 'epoch': 0.98} 98%|█████████▊| 1208/1230 [23:46:00<27:18, 74.49s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2438 [2024-08-01 01:58:03,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.40 | bwd_microstep: 5335.09 | bwd_inner_microstep: 4923.21 | bwd_allreduce_microstep: 411.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2305 [2024-08-01 01:58:12,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.44 | bwd_microstep: 5300.16 | bwd_inner_microstep: 4888.63 | bwd_allreduce_microstep: 411.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-08-01 01:58:20,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.08 | bwd_microstep: 4992.05 | bwd_inner_microstep: 4972.62 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3640 [2024-08-01 01:58:29,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.61 | bwd_microstep: 5105.32 | bwd_inner_microstep: 5018.97 | bwd_allreduce_microstep: 86.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-08-01 01:58:38,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.30 | bwd_microstep: 5194.90 | bwd_inner_microstep: 4789.36 | bwd_allreduce_microstep: 405.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-08-01 01:58:46,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.19 | bwd_microstep: 4911.52 | bwd_inner_microstep: 4892.19 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-08-01 01:58:55,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.89 | bwd_microstep: 4924.08 | bwd_inner_microstep: 4898.67 | bwd_allreduce_microstep: 25.34 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3693 [2024-08-01 01:59:04,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-08-01 01:59:04,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.44 | bwd_microstep: 5041.37 | bwd_inner_microstep: 4968.71 | bwd_allreduce_microstep: 72.59 | step_microstep: 182.69 [2024-08-01 01:59:04,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28982.26 | bwd: 40804.47 | bwd_inner: 39352.31 | bwd_allreduce: 1451.68 | step: 183.25 98%|█████████▊| 1209/1230 [23:47:10<25:36, 73.18s/it] {'loss': 1.1428, 'learning_rate': 1.5286806602372583e-08, 'epoch': 0.98} 98%|█████████▊| 1209/1230 [23:47:10<25:36, 73.18s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3992 [2024-08-01 01:59:13,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.06 | bwd_microstep: 5318.09 | bwd_inner_microstep: 5275.49 | bwd_allreduce_microstep: 42.53 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3815 [2024-08-01 01:59:21,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.65 | bwd_microstep: 4961.79 | bwd_inner_microstep: 4937.96 | bwd_allreduce_microstep: 23.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2266 [2024-08-01 01:59:30,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.29 | bwd_microstep: 5285.18 | bwd_inner_microstep: 4876.61 | bwd_allreduce_microstep: 408.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3776 [2024-08-01 01:59:39,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.53 | bwd_microstep: 5013.38 | bwd_inner_microstep: 4994.06 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-08-01 01:59:48,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.13 | bwd_microstep: 5083.94 | bwd_inner_microstep: 5023.62 | bwd_allreduce_microstep: 60.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-08-01 01:59:56,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.02 | bwd_microstep: 5065.83 | bwd_inner_microstep: 5006.52 | bwd_allreduce_microstep: 59.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-08-01 02:00:05,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.21 | bwd_microstep: 5048.49 | bwd_inner_microstep: 4989.46 | bwd_allreduce_microstep: 58.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-08-01 02:00:14,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-08-01 02:00:14,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.84 | bwd_microstep: 5032.23 | bwd_inner_microstep: 4970.07 | bwd_allreduce_microstep: 62.10 | step_microstep: 181.22 [2024-08-01 02:00:14,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28855.65 | bwd: 40808.91 | bwd_inner: 40073.73 | bwd_allreduce: 734.71 | step: 181.82 98%|█████████▊| 1210/1230 [23:48:20<24:04, 72.23s/it] {'loss': 1.1577, 'learning_rate': 1.3865912758054267e-08, 'epoch': 0.98} 98%|█████████▊| 1210/1230 [23:48:20<24:04, 72.23s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3838 [2024-08-01 02:00:23,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.85 | bwd_microstep: 5086.33 | bwd_inner_microstep: 5067.21 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-08-01 02:00:32,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.84 | bwd_microstep: 5037.32 | bwd_inner_microstep: 5014.54 | bwd_allreduce_microstep: 22.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3854 [2024-08-01 02:00:40,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3789.33 | bwd_microstep: 5112.40 | bwd_inner_microstep: 5093.09 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-08-01 02:00:49,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.03 | bwd_microstep: 5106.30 | bwd_inner_microstep: 5036.48 | bwd_allreduce_microstep: 69.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-08-01 02:00:58,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.12 | bwd_microstep: 5035.61 | bwd_inner_microstep: 4977.31 | bwd_allreduce_microstep: 58.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-08-01 02:01:06,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.32 | bwd_microstep: 4889.39 | bwd_inner_microstep: 4869.87 | bwd_allreduce_microstep: 19.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2153 [2024-08-01 02:01:15,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.50 | bwd_microstep: 5057.46 | bwd_inner_microstep: 4664.65 | bwd_allreduce_microstep: 392.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3643 [2024-08-01 02:01:24,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-08-01 02:01:24,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.21 | bwd_microstep: 5070.78 | bwd_inner_microstep: 4989.72 | bwd_allreduce_microstep: 80.99 | step_microstep: 180.90 [2024-08-01 02:01:24,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29152.11 | bwd: 40395.57 | bwd_inner: 39712.81 | bwd_allreduce: 682.26 | step: 181.47 98%|█████████▊| 1211/1230 [23:49:30<22:38, 71.52s/it] {'loss': 1.1354, 'learning_rate': 1.2514268311405452e-08, 'epoch': 0.98} 98%|█████████▊| 1211/1230 [23:49:30<22:38, 71.52s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3275 [2024-08-01 02:01:33,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.00 | bwd_microstep: 5252.98 | bwd_inner_microstep: 5043.28 | bwd_allreduce_microstep: 209.63 | step_microstep: 0.08 dynamic ViT batch size: 23, images per sample: 11.5, dynamic token length: 3872 [2024-08-01 02:01:41,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.84 | bwd_microstep: 5029.12 | bwd_inner_microstep: 5007.84 | bwd_allreduce_microstep: 21.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2304 [2024-08-01 02:01:50,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.26 | bwd_microstep: 5223.68 | bwd_inner_microstep: 4816.83 | bwd_allreduce_microstep: 406.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-08-01 02:01:58,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3049.74 | bwd_microstep: 5003.96 | bwd_inner_microstep: 4618.23 | bwd_allreduce_microstep: 385.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3733 [2024-08-01 02:02:07,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.49 | bwd_microstep: 5203.80 | bwd_inner_microstep: 5119.04 | bwd_allreduce_microstep: 84.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-08-01 02:02:16,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.36 | bwd_microstep: 5304.48 | bwd_inner_microstep: 4895.07 | bwd_allreduce_microstep: 409.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-08-01 02:02:25,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.63 | bwd_microstep: 4957.67 | bwd_inner_microstep: 4932.04 | bwd_allreduce_microstep: 25.56 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3668 [2024-08-01 02:02:33,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-08-01 02:02:33,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.82 | bwd_microstep: 5057.39 | bwd_inner_microstep: 4984.18 | bwd_allreduce_microstep: 73.14 | step_microstep: 181.73 [2024-08-01 02:02:33,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28374.05 | bwd: 41033.06 | bwd_inner: 39416.42 | bwd_allreduce: 1616.15 | step: 182.31 99%|█████████▊| 1212/1230 [23:50:39<21:17, 70.99s/it] {'loss': 1.131, 'learning_rate': 1.1231882635477364e-08, 'epoch': 0.99} 99%|█████████▊| 1212/1230 [23:50:39<21:17, 70.99s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4012 [2024-08-01 02:02:42,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.20 | bwd_microstep: 5224.07 | bwd_inner_microstep: 5187.97 | bwd_allreduce_microstep: 36.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3591 [2024-08-01 02:02:51,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.01 | bwd_microstep: 5202.83 | bwd_inner_microstep: 5120.56 | bwd_allreduce_microstep: 82.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-08-01 02:03:00,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3390.37 | bwd_microstep: 5129.91 | bwd_inner_microstep: 5063.92 | bwd_allreduce_microstep: 65.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-08-01 02:03:08,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.55 | bwd_microstep: 5243.98 | bwd_inner_microstep: 5148.06 | bwd_allreduce_microstep: 95.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-08-01 02:03:17,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.17 | bwd_microstep: 5060.69 | bwd_inner_microstep: 5003.08 | bwd_allreduce_microstep: 57.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-08-01 02:03:26,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.36 | bwd_microstep: 5185.34 | bwd_inner_microstep: 5112.48 | bwd_allreduce_microstep: 72.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-08-01 02:03:35,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.70 | bwd_microstep: 5090.97 | bwd_inner_microstep: 5047.88 | bwd_allreduce_microstep: 43.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2113 [2024-08-01 02:03:43,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-08-01 02:03:43,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2967.02 | bwd_microstep: 4841.69 | bwd_inner_microstep: 4469.74 | bwd_allreduce_microstep: 371.88 | step_microstep: 181.85 [2024-08-01 02:03:43,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27894.28 | bwd: 40979.47 | bwd_inner: 40153.63 | bwd_allreduce: 825.36 | step: 182.42 99%|█████████▊| 1213/1230 [23:51:49<19:57, 70.45s/it] {'loss': 1.161, 'learning_rate': 1.0018764623044297e-08, 'epoch': 0.99} 99%|█████████▊| 1213/1230 [23:51:49<19:57, 70.45s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3979 [2024-08-01 02:03:52,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3872.32 | bwd_microstep: 5359.28 | bwd_inner_microstep: 5327.64 | bwd_allreduce_microstep: 31.58 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2028 [2024-08-01 02:04:00,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3056.76 | bwd_microstep: 5062.29 | bwd_inner_microstep: 4673.00 | bwd_allreduce_microstep: 389.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-08-01 02:04:08,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3227.89 | bwd_microstep: 4960.57 | bwd_inner_microstep: 4898.28 | bwd_allreduce_microstep: 62.23 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3634 [2024-08-01 02:04:16,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3116.62 | bwd_microstep: 4998.21 | bwd_inner_microstep: 4931.98 | bwd_allreduce_microstep: 66.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-08-01 02:04:25,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.57 | bwd_microstep: 5168.03 | bwd_inner_microstep: 5112.80 | bwd_allreduce_microstep: 55.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-08-01 02:04:33,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2992.23 | bwd_microstep: 4858.86 | bwd_inner_microstep: 4484.23 | bwd_allreduce_microstep: 374.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-08-01 02:04:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.01 | bwd_microstep: 5001.83 | bwd_inner_microstep: 4947.64 | bwd_allreduce_microstep: 54.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-08-01 02:04:50,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-08-01 02:04:50,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.27 | bwd_microstep: 5097.18 | bwd_inner_microstep: 4700.10 | bwd_allreduce_microstep: 397.01 | step_microstep: 181.54 [2024-08-01 02:04:50,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 26962.57 | bwd: 40506.24 | bwd_inner: 39075.60 | bwd_allreduce: 1430.16 | step: 182.12 99%|█████████▊| 1214/1230 [23:52:56<18:34, 69.66s/it] {'loss': 1.1174, 'learning_rate': 8.874922686541442e-09, 'epoch': 0.99} 99%|█████████▊| 1214/1230 [23:52:56<18:34, 69.66s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3908 [2024-08-01 02:04:59,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3296.40 | bwd_microstep: 4976.09 | bwd_inner_microstep: 4954.06 | bwd_allreduce_microstep: 21.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3906 [2024-08-01 02:05:08,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.30 | bwd_microstep: 5135.59 | bwd_inner_microstep: 5100.76 | bwd_allreduce_microstep: 34.76 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2074 [2024-08-01 02:05:16,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.56 | bwd_microstep: 5217.32 | bwd_inner_microstep: 4812.88 | bwd_allreduce_microstep: 404.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-08-01 02:05:25,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.32 | bwd_microstep: 5130.68 | bwd_inner_microstep: 5047.20 | bwd_allreduce_microstep: 83.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-08-01 02:05:34,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.00 | bwd_microstep: 5029.64 | bwd_inner_microstep: 5010.31 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-08-01 02:05:42,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.70 | bwd_microstep: 4878.38 | bwd_inner_microstep: 4859.09 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2161 [2024-08-01 02:05:51,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.11 | bwd_microstep: 5173.83 | bwd_inner_microstep: 4769.33 | bwd_allreduce_microstep: 404.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3673 [2024-08-01 02:06:00,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-08-01 02:06:00,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.56 | bwd_microstep: 5007.83 | bwd_inner_microstep: 4972.77 | bwd_allreduce_microstep: 35.00 | step_microstep: 181.40 [2024-08-01 02:06:00,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28737.88 | bwd: 40549.35 | bwd_inner: 39526.34 | bwd_allreduce: 1022.53 | step: 181.97 99%|█████████▉| 1215/1230 [23:54:06<17:24, 69.64s/it] {'loss': 1.0942, 'learning_rate': 7.800364758002721e-09, 'epoch': 0.99} 99%|█████████▉| 1215/1230 [23:54:06<17:24, 69.64s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3559 [2024-08-01 02:06:09,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3447.74 | bwd_microstep: 5375.97 | bwd_inner_microstep: 5243.32 | bwd_allreduce_microstep: 132.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-08-01 02:06:18,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.27 | bwd_microstep: 5229.93 | bwd_inner_microstep: 5167.49 | bwd_allreduce_microstep: 62.38 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3835 [2024-08-01 02:06:27,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.75 | bwd_microstep: 5210.01 | bwd_inner_microstep: 5140.72 | bwd_allreduce_microstep: 69.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-08-01 02:06:35,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.63 | bwd_microstep: 4998.17 | bwd_inner_microstep: 4978.80 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3780 [2024-08-01 02:06:44,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.89 | bwd_microstep: 5021.58 | bwd_inner_microstep: 5002.23 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-08-01 02:06:52,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3203.62 | bwd_microstep: 4707.35 | bwd_inner_microstep: 4687.90 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-08-01 02:07:01,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.48 | bwd_microstep: 5110.15 | bwd_inner_microstep: 5063.07 | bwd_allreduce_microstep: 47.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-08-01 02:07:10,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-08-01 02:07:10,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.18 | bwd_microstep: 4991.00 | bwd_inner_microstep: 4971.59 | bwd_allreduce_microstep: 19.34 | step_microstep: 181.71 [2024-08-01 02:07:10,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28706.47 | bwd: 40644.16 | bwd_inner: 40255.07 | bwd_allreduce: 388.59 | step: 182.30 99%|█████████▉| 1216/1230 [23:55:16<16:15, 69.66s/it] {'loss': 1.1148, 'learning_rate': 6.795098289008595e-09, 'epoch': 0.99} 99%|█████████▉| 1216/1230 [23:55:16<16:15, 69.66s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3714 [2024-08-01 02:07:19,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.95 | bwd_microstep: 5477.76 | bwd_inner_microstep: 5340.56 | bwd_allreduce_microstep: 137.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3790 [2024-08-01 02:07:28,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.93 | bwd_microstep: 5325.15 | bwd_inner_microstep: 5255.40 | bwd_allreduce_microstep: 69.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-08-01 02:07:37,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.02 | bwd_microstep: 5352.44 | bwd_inner_microstep: 4939.69 | bwd_allreduce_microstep: 412.68 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3819 [2024-08-01 02:07:46,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.48 | bwd_microstep: 5115.94 | bwd_inner_microstep: 5056.67 | bwd_allreduce_microstep: 59.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3805 [2024-08-01 02:07:54,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.17 | bwd_microstep: 5156.02 | bwd_inner_microstep: 5109.52 | bwd_allreduce_microstep: 46.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-08-01 02:08:03,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.67 | bwd_microstep: 5137.53 | bwd_inner_microstep: 5057.69 | bwd_allreduce_microstep: 79.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-08-01 02:08:12,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.06 | bwd_microstep: 4981.02 | bwd_inner_microstep: 4961.59 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-08-01 02:08:21,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-08-01 02:08:21,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.51 | bwd_microstep: 5167.71 | bwd_inner_microstep: 5091.41 | bwd_allreduce_microstep: 76.23 | step_microstep: 181.19 [2024-08-01 02:08:21,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29116.69 | bwd: 41713.54 | bwd_inner: 40812.48 | bwd_allreduce: 900.59 | step: 181.78 99%|█████████▉| 1217/1230 [23:56:27<15:11, 70.11s/it] {'loss': 1.1866, 'learning_rate': 5.859130250636113e-09, 'epoch': 0.99} 99%|█████████▉| 1217/1230 [23:56:27<15:11, 70.11s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2316 [2024-08-01 02:08:30,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.31 | bwd_microstep: 5216.93 | bwd_inner_microstep: 4820.05 | bwd_allreduce_microstep: 396.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-08-01 02:08:38,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.09 | bwd_microstep: 5192.09 | bwd_inner_microstep: 5131.51 | bwd_allreduce_microstep: 60.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3787 [2024-08-01 02:08:47,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.92 | bwd_microstep: 5060.99 | bwd_inner_microstep: 5036.85 | bwd_allreduce_microstep: 24.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-08-01 02:08:56,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.01 | bwd_microstep: 5051.07 | bwd_inner_microstep: 5022.32 | bwd_allreduce_microstep: 28.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-08-01 02:09:04,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3196.11 | bwd_microstep: 4724.45 | bwd_inner_microstep: 4701.69 | bwd_allreduce_microstep: 22.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-08-01 02:09:13,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3696.00 | bwd_microstep: 4892.40 | bwd_inner_microstep: 4871.93 | bwd_allreduce_microstep: 20.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2138 [2024-08-01 02:09:21,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.03 | bwd_microstep: 5091.98 | bwd_inner_microstep: 4697.10 | bwd_allreduce_microstep: 394.78 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-08-01 02:09:30,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-08-01 02:09:30,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.40 | bwd_microstep: 4988.27 | bwd_inner_microstep: 4939.11 | bwd_allreduce_microstep: 49.09 | step_microstep: 181.70 [2024-08-01 02:09:30,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28540.78 | bwd: 40218.16 | bwd_inner: 39220.49 | bwd_allreduce: 997.18 | step: 182.27 99%|█████████▉| 1218/1230 [23:57:36<13:57, 69.80s/it] {'loss': 1.1487, 'learning_rate': 4.992467133406731e-09, 'epoch': 0.99} 99%|█████████▉| 1218/1230 [23:57:36<13:57, 69.80s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3558 [2024-08-01 02:09:39,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.65 | bwd_microstep: 5306.73 | bwd_inner_microstep: 5204.23 | bwd_allreduce_microstep: 102.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2309 [2024-08-01 02:09:48,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.61 | bwd_microstep: 5270.98 | bwd_inner_microstep: 4862.39 | bwd_allreduce_microstep: 408.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3906 [2024-08-01 02:09:57,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3785.23 | bwd_microstep: 5156.17 | bwd_inner_microstep: 5136.86 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3879 [2024-08-01 02:10:06,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3812.21 | bwd_microstep: 5184.48 | bwd_inner_microstep: 5157.91 | bwd_allreduce_microstep: 26.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3755 [2024-08-01 02:10:15,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.69 | bwd_microstep: 5270.73 | bwd_inner_microstep: 5181.91 | bwd_allreduce_microstep: 88.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-08-01 02:10:24,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.57 | bwd_microstep: 5164.91 | bwd_inner_microstep: 5092.68 | bwd_allreduce_microstep: 72.16 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2091 [2024-08-01 02:10:32,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.39 | bwd_microstep: 5105.72 | bwd_inner_microstep: 4708.64 | bwd_allreduce_microstep: 397.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-08-01 02:10:41,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-08-01 02:10:41,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3394.75 | bwd_microstep: 4987.69 | bwd_inner_microstep: 4930.73 | bwd_allreduce_microstep: 56.89 | step_microstep: 197.54 [2024-08-01 02:10:41,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28998.99 | bwd: 41447.39 | bwd_inner: 40275.29 | bwd_allreduce: 1171.61 | step: 198.12 99%|█████████▉| 1219/1230 [23:58:47<12:51, 70.10s/it] {'loss': 1.0937, 'learning_rate': 4.195114947244117e-09, 'epoch': 0.99} 99%|█████████▉| 1219/1230 [23:58:47<12:51, 70.10s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3993 [2024-08-01 02:10:50,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3851.37 | bwd_microstep: 5277.41 | bwd_inner_microstep: 5258.32 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-08-01 02:10:58,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3077.22 | bwd_microstep: 5109.72 | bwd_inner_microstep: 4716.05 | bwd_allreduce_microstep: 393.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2322 [2024-08-01 02:11:07,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.68 | bwd_microstep: 5225.80 | bwd_inner_microstep: 4819.79 | bwd_allreduce_microstep: 405.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-08-01 02:11:15,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3276.76 | bwd_microstep: 5046.26 | bwd_inner_microstep: 4657.12 | bwd_allreduce_microstep: 389.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-08-01 02:11:24,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.89 | bwd_microstep: 4970.15 | bwd_inner_microstep: 4940.07 | bwd_allreduce_microstep: 30.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-08-01 02:11:33,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.35 | bwd_microstep: 5173.50 | bwd_inner_microstep: 4770.70 | bwd_allreduce_microstep: 402.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3649 [2024-08-01 02:11:41,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.70 | bwd_microstep: 4905.00 | bwd_inner_microstep: 4879.01 | bwd_allreduce_microstep: 25.93 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2140 [2024-08-01 02:11:50,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-08-01 02:11:50,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.80 | bwd_microstep: 5100.53 | bwd_inner_microstep: 4705.27 | bwd_allreduce_microstep: 395.20 | step_microstep: 182.50 [2024-08-01 02:11:50,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28182.67 | bwd: 40808.35 | bwd_inner: 38746.27 | bwd_allreduce: 2061.60 | step: 183.06 99%|█████████▉| 1220/1230 [23:59:56<11:38, 69.87s/it] {'loss': 1.1327, 'learning_rate': 3.4670792214297476e-09, 'epoch': 0.99} 99%|█████████▉| 1220/1230 [23:59:56<11:38, 69.87s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4096 [2024-08-01 02:11:59,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3891.19 | bwd_microstep: 5397.80 | bwd_inner_microstep: 5370.80 | bwd_allreduce_microstep: 26.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-08-01 02:12:08,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.36 | bwd_microstep: 5126.22 | bwd_inner_microstep: 5050.15 | bwd_allreduce_microstep: 76.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3763 [2024-08-01 02:12:17,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.95 | bwd_microstep: 5159.10 | bwd_inner_microstep: 5109.45 | bwd_allreduce_microstep: 49.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-08-01 02:12:26,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.66 | bwd_microstep: 5100.59 | bwd_inner_microstep: 5054.56 | bwd_allreduce_microstep: 45.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-08-01 02:12:34,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3239.33 | bwd_microstep: 4835.94 | bwd_inner_microstep: 4789.31 | bwd_allreduce_microstep: 46.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-08-01 02:12:43,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.12 | bwd_microstep: 5011.92 | bwd_inner_microstep: 4988.72 | bwd_allreduce_microstep: 23.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2136 [2024-08-01 02:12:51,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3064.53 | bwd_microstep: 5059.45 | bwd_inner_microstep: 4671.70 | bwd_allreduce_microstep: 387.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2140 [2024-08-01 02:12:59,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-08-01 02:12:59,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3020.58 | bwd_microstep: 4898.62 | bwd_inner_microstep: 4520.90 | bwd_allreduce_microstep: 377.65 | step_microstep: 186.42 [2024-08-01 02:12:59,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27722.64 | bwd: 40589.62 | bwd_inner: 39555.53 | bwd_allreduce: 1033.62 | step: 187.01 99%|█████████▉| 1221/1230 [24:01:05<10:25, 69.50s/it] {'loss': 1.1506, 'learning_rate': 2.808365004569602e-09, 'epoch': 0.99} 99%|█████████▉| 1221/1230 [24:01:05<10:25, 69.50s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3852 [2024-08-01 02:13:08,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.81 | bwd_microstep: 5321.73 | bwd_inner_microstep: 5258.78 | bwd_allreduce_microstep: 62.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3845 [2024-08-01 02:13:17,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.97 | bwd_microstep: 5114.35 | bwd_inner_microstep: 5094.73 | bwd_allreduce_microstep: 19.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-08-01 02:13:26,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.25 | bwd_microstep: 5180.36 | bwd_inner_microstep: 5098.94 | bwd_allreduce_microstep: 81.36 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2084 [2024-08-01 02:13:34,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.40 | bwd_microstep: 5196.99 | bwd_inner_microstep: 4794.37 | bwd_allreduce_microstep: 402.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-08-01 02:13:43,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.73 | bwd_microstep: 5177.34 | bwd_inner_microstep: 5118.30 | bwd_allreduce_microstep: 58.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-08-01 02:13:52,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.75 | bwd_microstep: 5057.28 | bwd_inner_microstep: 5016.34 | bwd_allreduce_microstep: 40.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-08-01 02:14:01,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.80 | bwd_microstep: 4982.49 | bwd_inner_microstep: 4963.11 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-08-01 02:14:09,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-08-01 02:14:09,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.71 | bwd_microstep: 4890.17 | bwd_inner_microstep: 4870.84 | bwd_allreduce_microstep: 19.26 | step_microstep: 181.44 [2024-08-01 02:14:09,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29412.31 | bwd: 40920.71 | bwd_inner: 40215.36 | bwd_allreduce: 704.86 | step: 182.01 99%|█████████▉| 1222/1230 [24:02:15<09:18, 69.85s/it] {'loss': 1.1333, 'learning_rate': 2.2189768645519693e-09, 'epoch': 0.99} 99%|█████████▉| 1222/1230 [24:02:15<09:18, 69.85s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2389 [2024-08-01 02:14:19,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.39 | bwd_microstep: 5491.00 | bwd_inner_microstep: 5069.81 | bwd_allreduce_microstep: 421.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-08-01 02:14:27,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.93 | bwd_microstep: 5006.03 | bwd_inner_microstep: 4984.32 | bwd_allreduce_microstep: 21.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-08-01 02:14:36,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.93 | bwd_microstep: 5145.71 | bwd_inner_microstep: 5094.41 | bwd_allreduce_microstep: 51.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-08-01 02:14:45,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.81 | bwd_microstep: 5425.06 | bwd_inner_microstep: 5006.23 | bwd_allreduce_microstep: 418.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3867 [2024-08-01 02:14:54,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.64 | bwd_microstep: 5062.04 | bwd_inner_microstep: 5027.45 | bwd_allreduce_microstep: 34.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-08-01 02:15:03,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.39 | bwd_microstep: 5105.75 | bwd_inner_microstep: 5059.34 | bwd_allreduce_microstep: 46.35 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-08-01 02:15:11,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3458.03 | bwd_microstep: 5028.33 | bwd_inner_microstep: 4638.66 | bwd_allreduce_microstep: 389.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-08-01 02:15:19,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-08-01 02:15:19,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3196.96 | bwd_microstep: 4729.04 | bwd_inner_microstep: 4703.99 | bwd_allreduce_microstep: 24.98 | step_microstep: 181.91 [2024-08-01 02:15:19,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28514.96 | bwd: 40992.94 | bwd_inner: 39584.15 | bwd_allreduce: 1408.31 | step: 182.50 99%|█████████▉| 1223/1230 [24:03:25<08:08, 69.85s/it] {'loss': 1.1423, 'learning_rate': 1.6989188885219165e-09, 'epoch': 0.99} 99%|█████████▉| 1223/1230 [24:03:25<08:08, 69.85s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 1966 [2024-08-01 02:15:29,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.23 | bwd_microstep: 5591.58 | bwd_inner_microstep: 5162.53 | bwd_allreduce_microstep: 428.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3863 [2024-08-01 02:15:38,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3801.90 | bwd_microstep: 5145.14 | bwd_inner_microstep: 5117.96 | bwd_allreduce_microstep: 27.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2222 [2024-08-01 02:15:46,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.22 | bwd_microstep: 5197.54 | bwd_inner_microstep: 4792.51 | bwd_allreduce_microstep: 404.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-08-01 02:15:55,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.40 | bwd_microstep: 5161.51 | bwd_inner_microstep: 5104.49 | bwd_allreduce_microstep: 56.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-08-01 02:16:04,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.90 | bwd_microstep: 5033.29 | bwd_inner_microstep: 5006.20 | bwd_allreduce_microstep: 27.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-08-01 02:16:12,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.94 | bwd_microstep: 4909.69 | bwd_inner_microstep: 4890.38 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3707 [2024-08-01 02:16:21,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.84 | bwd_microstep: 5001.95 | bwd_inner_microstep: 4936.32 | bwd_allreduce_microstep: 65.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-08-01 02:16:30,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-08-01 02:16:30,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3395.57 | bwd_microstep: 4904.54 | bwd_inner_microstep: 4866.06 | bwd_allreduce_microstep: 38.41 | step_microstep: 181.66 [2024-08-01 02:16:30,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28960.92 | bwd: 40945.23 | bwd_inner: 39876.38 | bwd_allreduce: 1068.37 | step: 182.24 100%|█████████▉| 1224/1230 [24:04:35<06:59, 69.97s/it] {'loss': 1.1475, 'learning_rate': 1.2481946828502011e-09, 'epoch': 1.0} 100%|█████████▉| 1224/1230 [24:04:35<06:59, 69.97s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3937 [2024-08-01 02:16:39,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3801.91 | bwd_microstep: 5179.57 | bwd_inner_microstep: 5160.46 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3866 [2024-08-01 02:16:48,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3671.68 | bwd_microstep: 5342.23 | bwd_inner_microstep: 5252.86 | bwd_allreduce_microstep: 89.30 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3809 [2024-08-01 02:16:56,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3124.09 | bwd_microstep: 5059.67 | bwd_inner_microstep: 5016.91 | bwd_allreduce_microstep: 42.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3784 [2024-08-01 02:17:05,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.06 | bwd_microstep: 5042.09 | bwd_inner_microstep: 5022.69 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-08-01 02:17:13,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.57 | bwd_microstep: 5005.18 | bwd_inner_microstep: 4985.86 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2132 [2024-08-01 02:17:22,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.41 | bwd_microstep: 5175.87 | bwd_inner_microstep: 4771.94 | bwd_allreduce_microstep: 403.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-08-01 02:17:31,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.39 | bwd_microstep: 5086.76 | bwd_inner_microstep: 4692.58 | bwd_allreduce_microstep: 394.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-08-01 02:17:39,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-08-01 02:17:39,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.49 | bwd_microstep: 4995.28 | bwd_inner_microstep: 4942.07 | bwd_allreduce_microstep: 53.14 | step_microstep: 183.58 [2024-08-01 02:17:39,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28684.48 | bwd: 40886.64 | bwd_inner: 39845.31 | bwd_allreduce: 1040.83 | step: 184.17 100%|█████████▉| 1225/1230 [24:05:45<05:49, 69.95s/it] {'loss': 1.1123, 'learning_rate': 8.668073731088467e-10, 'epoch': 1.0} 100%|█████████▉| 1225/1230 [24:05:45<05:49, 69.95s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2051 [2024-08-01 02:17:48,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.13 | bwd_microstep: 5385.79 | bwd_inner_microstep: 4973.92 | bwd_allreduce_microstep: 411.80 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2015 [2024-08-01 02:17:57,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.07 | bwd_microstep: 5287.52 | bwd_inner_microstep: 4877.67 | bwd_allreduce_microstep: 409.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-08-01 02:18:06,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.48 | bwd_microstep: 5227.08 | bwd_inner_microstep: 4819.99 | bwd_allreduce_microstep: 407.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-08-01 02:18:15,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.49 | bwd_microstep: 5081.12 | bwd_inner_microstep: 5034.58 | bwd_allreduce_microstep: 46.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-08-01 02:18:23,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.11 | bwd_microstep: 4957.04 | bwd_inner_microstep: 4932.72 | bwd_allreduce_microstep: 24.26 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2157 [2024-08-01 02:18:32,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.30 | bwd_microstep: 5176.39 | bwd_inner_microstep: 4775.32 | bwd_allreduce_microstep: 401.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-08-01 02:18:41,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.70 | bwd_microstep: 5068.17 | bwd_inner_microstep: 5008.30 | bwd_allreduce_microstep: 59.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-08-01 02:18:49,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-08-01 02:18:49,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3393.40 | bwd_microstep: 4895.68 | bwd_inner_microstep: 4854.51 | bwd_allreduce_microstep: 41.10 | step_microstep: 181.17 [2024-08-01 02:18:49,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28489.58 | bwd: 41078.77 | bwd_inner: 39276.94 | bwd_allreduce: 1801.35 | step: 181.75 100%|█████████▉| 1226/1230 [24:06:55<04:39, 69.93s/it] {'loss': 1.1709, 'learning_rate': 5.547596040489378e-10, 'epoch': 1.0} 100%|█████████▉| 1226/1230 [24:06:55<04:39, 69.93s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3916 [2024-08-01 02:18:59,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.69 | bwd_microstep: 5462.41 | bwd_inner_microstep: 5385.87 | bwd_allreduce_microstep: 76.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2225 [2024-08-01 02:19:07,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.20 | bwd_microstep: 5330.16 | bwd_inner_microstep: 4916.65 | bwd_allreduce_microstep: 413.44 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-08-01 02:19:16,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.52 | bwd_microstep: 5033.45 | bwd_inner_microstep: 5014.18 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3750 [2024-08-01 02:19:25,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.15 | bwd_microstep: 5101.92 | bwd_inner_microstep: 5054.72 | bwd_allreduce_microstep: 47.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-08-01 02:19:34,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.17 | bwd_microstep: 4982.21 | bwd_inner_microstep: 4962.90 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-08-01 02:19:43,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.14 | bwd_microstep: 5262.37 | bwd_inner_microstep: 5175.98 | bwd_allreduce_microstep: 86.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-08-01 02:19:51,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.07 | bwd_microstep: 5052.16 | bwd_inner_microstep: 4991.44 | bwd_allreduce_microstep: 60.65 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3769 [2024-08-01 02:20:00,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-08-01 02:20:00,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.71 | bwd_microstep: 5018.33 | bwd_inner_microstep: 4975.83 | bwd_allreduce_microstep: 42.44 | step_microstep: 181.08 [2024-08-01 02:20:00,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29086.55 | bwd: 41242.99 | bwd_inner: 40477.50 | bwd_allreduce: 765.01 | step: 181.67 100%|█████████▉| 1227/1230 [24:08:06<03:30, 70.15s/it] {'loss': 1.0993, 'learning_rate': 3.1205353958285724e-10, 'epoch': 1.0} 100%|█████████▉| 1227/1230 [24:08:06<03:30, 70.15s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-08-01 02:20:09,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.55 | bwd_microstep: 5571.67 | bwd_inner_microstep: 5470.74 | bwd_allreduce_microstep: 100.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3546 [2024-08-01 02:20:18,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.37 | bwd_microstep: 5155.19 | bwd_inner_microstep: 5067.07 | bwd_allreduce_microstep: 88.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3806 [2024-08-01 02:20:27,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.93 | bwd_microstep: 5069.38 | bwd_inner_microstep: 5047.26 | bwd_allreduce_microstep: 22.06 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-08-01 02:20:36,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.76 | bwd_microstep: 5069.39 | bwd_inner_microstep: 5041.20 | bwd_allreduce_microstep: 28.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-08-01 02:20:45,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.05 | bwd_microstep: 5024.94 | bwd_inner_microstep: 5001.07 | bwd_allreduce_microstep: 23.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-08-01 02:20:53,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.87 | bwd_microstep: 5225.55 | bwd_inner_microstep: 5159.27 | bwd_allreduce_microstep: 66.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-08-01 02:21:02,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.98 | bwd_microstep: 5056.24 | bwd_inner_microstep: 4996.50 | bwd_allreduce_microstep: 59.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-08-01 02:21:11,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-08-01 02:21:11,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.82 | bwd_microstep: 5036.11 | bwd_inner_microstep: 4979.85 | bwd_allreduce_microstep: 56.20 | step_microstep: 181.42 [2024-08-01 02:21:11,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29399.23 | bwd: 41208.46 | bwd_inner: 40762.89 | bwd_allreduce: 445.10 | step: 182.01 100%|█████████▉| 1228/1230 [24:09:17<02:20, 70.39s/it] {'loss': 1.0955, 'learning_rate': 1.3869086276985243e-10, 'epoch': 1.0} 100%|█████████▉| 1228/1230 [24:09:17<02:20, 70.39s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4042 [2024-08-01 02:21:20,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3848.31 | bwd_microstep: 5359.59 | bwd_inner_microstep: 5340.53 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3566 [2024-08-01 02:21:29,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.84 | bwd_microstep: 5193.46 | bwd_inner_microstep: 5097.86 | bwd_allreduce_microstep: 95.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3670 [2024-08-01 02:21:38,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.01 | bwd_microstep: 4861.71 | bwd_inner_microstep: 4842.36 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2105 [2024-08-01 02:21:46,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.76 | bwd_microstep: 5254.52 | bwd_inner_microstep: 4845.86 | bwd_allreduce_microstep: 408.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3770 [2024-08-01 02:21:55,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.87 | bwd_microstep: 5140.89 | bwd_inner_microstep: 5088.99 | bwd_allreduce_microstep: 51.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-08-01 02:22:04,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.29 | bwd_microstep: 5054.15 | bwd_inner_microstep: 5015.91 | bwd_allreduce_microstep: 38.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-08-01 02:22:13,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.94 | bwd_microstep: 5222.93 | bwd_inner_microstep: 4818.39 | bwd_allreduce_microstep: 404.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-08-01 02:22:22,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-08-01 02:22:22,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.21 | bwd_microstep: 5003.05 | bwd_inner_microstep: 4950.05 | bwd_allreduce_microstep: 52.92 | step_microstep: 181.90 [2024-08-01 02:22:22,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29146.13 | bwd: 41090.28 | bwd_inner: 39999.90 | bwd_allreduce: 1089.90 | step: 182.48 100%|█████████▉| 1229/1230 [24:10:27<01:10, 70.45s/it] {'loss': 1.1693, 'learning_rate': 3.467277580271322e-11, 'epoch': 1.0} 100%|█████████▉| 1229/1230 [24:10:27<01:10, 70.45s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2369 [2024-08-01 02:22:31,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.95 | bwd_microstep: 5644.27 | bwd_inner_microstep: 5211.91 | bwd_allreduce_microstep: 432.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3578 [2024-08-01 02:22:40,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3423.53 | bwd_microstep: 5325.20 | bwd_inner_microstep: 5221.85 | bwd_allreduce_microstep: 103.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3920 [2024-08-01 02:22:49,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3791.12 | bwd_microstep: 5153.90 | bwd_inner_microstep: 5134.59 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2213 [2024-08-01 02:22:57,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.03 | bwd_microstep: 5264.31 | bwd_inner_microstep: 4857.04 | bwd_allreduce_microstep: 407.20 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2213 [2024-08-01 02:23:06,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.63 | bwd_microstep: 5064.53 | bwd_inner_microstep: 4669.71 | bwd_allreduce_microstep: 394.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3666 [2024-08-01 02:23:15,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.35 | bwd_microstep: 5073.07 | bwd_inner_microstep: 4993.36 | bwd_allreduce_microstep: 79.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-08-01 02:23:23,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.64 | bwd_microstep: 5009.17 | bwd_inner_microstep: 4954.75 | bwd_allreduce_microstep: 54.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-08-01 02:23:32,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-08-01 02:23:32,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.26 | bwd_microstep: 5047.85 | bwd_inner_microstep: 4989.32 | bwd_allreduce_microstep: 58.47 | step_microstep: 182.60 [2024-08-01 02:23:32,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28663.42 | bwd: 41582.28 | bwd_inner: 40032.47 | bwd_allreduce: 1549.33 | step: 183.19 100%|██████████| 1230/1230 [24:11:38<00:00, 70.48s/it] {'loss': 1.0486, 'learning_rate': 0.0, 'epoch': 1.0} 100%|██████████| 1230/1230 [24:11:38<00:00, 70.48s/it]petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. [INFO|trainer.py:1962] 2024-08-01 02:23:33,967 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 87109.1288, 'train_samples_per_second': 1.807, 'train_steps_per_second': 0.014, 'train_loss': 1.175328787167867, 'epoch': 1.0} 100%|██████████| 1230/1230 [24:11:40<00:00, 70.48s/it] 100%|██████████| 1230/1230 [24:11:40<00:00, 70.81s/it] [INFO|trainer.py:2936] 2024-08-01 02:24:01,449 >> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7 [INFO|configuration_utils.py:473] 2024-08-01 02:24:01,450 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/config.json [INFO|configuration_utils.py:594] 2024-08-01 02:24:01,451 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/generation_config.json [INFO|modeling_utils.py:2501] 2024-08-01 02:24:55,591 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-08-01 02:24:55,593 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-08-01 02:24:55,593 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-08-01 02:24:55,593 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-15w7/added_tokens.json ***** train metrics ***** epoch = 1.0 train_loss = 1.1753 train_runtime = 1 day, 0:11:49.12 train_samples = 157445 train_samples_per_second = 1.807 train_steps_per_second = 0.014 wandb: - 0.016 MB of 0.016 MB uploaded wandb: \ 0.016 MB of 0.016 MB uploaded wandb: | 0.016 MB of 3.208 MB uploaded wandb: / 0.016 MB of 3.222 MB uploaded wandb: - 0.016 MB of 3.222 MB uploaded wandb: \ 0.016 MB of 3.222 MB uploaded wandb: | 0.016 MB of 3.222 MB uploaded wandb: / 0.016 MB of 3.222 MB uploaded wandb: - 0.022 MB of 3.222 MB uploaded wandb: \ 0.027 MB of 3.222 MB uploaded wandb: | 0.027 MB of 3.222 MB uploaded wandb: / 0.027 MB of 3.222 MB uploaded wandb: - 0.027 MB of 3.222 MB uploaded wandb: \ 0.027 MB of 3.222 MB uploaded wandb: | 0.027 MB of 3.222 MB uploaded wandb: / 0.027 MB of 3.222 MB uploaded wandb: - 0.027 MB of 3.222 MB uploaded wandb: \ 0.027 MB of 3.222 MB uploaded wandb: | 0.027 MB of 3.222 MB uploaded wandb: / 0.027 MB of 3.222 MB uploaded wandb: - 0.027 MB of 3.222 MB uploaded wandb: \ 0.027 MB of 3.222 MB uploaded wandb: | 0.027 MB of 3.222 MB uploaded wandb: / 0.027 MB of 3.222 MB uploaded wandb: - 0.027 MB of 3.222 MB uploaded wandb: \ 0.183 MB of 3.222 MB uploaded wandb: | 2.402 MB of 3.222 MB uploaded wandb: / 3.208 MB of 3.222 MB uploaded wandb: - 3.208 MB of 3.222 MB uploaded wandb: \ 3.208 MB of 3.222 MB uploaded wandb: | 3.222 MB of 3.222 MB uploaded wandb: wandb: Run history: wandb: train/epoch ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███ wandb: train/global_step ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: train/learning_rate ▄▇██████▇▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁ wandb: train/loss ▆▅▆▇▅█▄▆▅▅▂▅▆▄▄▅▅▃▅▂▃▃▃▂▂▁▃▃▃▃▃▃▂▄▃▃▃▁▃▂ wandb: train/total_flos ▁ wandb: train/train_loss ▁ wandb: train/train_runtime ▁ wandb: train/train_samples_per_second ▁ wandb: train/train_steps_per_second ▁ wandb: wandb: Run summary: wandb: train/epoch 1.0 wandb: train/global_step 1230 wandb: train/learning_rate 0.0 wandb: train/loss 1.0486 wandb: train/total_flos 4.0987686361796444e+19 wandb: train/train_loss 1.17533 wandb: train/train_runtime 87109.1288 wandb: train/train_samples_per_second 1.807 wandb: train/train_steps_per_second 0.014 wandb: wandb: 🚀 View run major-galaxy-29 at: https://wandb.ai/pku_kcl/huggingface/runs/lciklai1 wandb: ⭐️ View project at: https://wandb.ai/pku_kcl/huggingface wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20240731_021148-lciklai1/logs