python train_ddp_spawn.py --base configs/train-v01.yaml --no-test True --train True --logdir outputs/logs/train-v01 [2024-09-29 13:19:46,460] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible 2024-09-29 13:19:55.519392: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-09-29 13:19:55.698222: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used. 2024-09-29 13:19:56.301373: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-29 13:19:58.557393: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Global seed set to 2300 [09/29 13:20:06 VTDM]: Running on GPUs 7, [09/29 13:20:06 VTDM]: Use the strategy of deepspeed_stage_2 [09/29 13:20:06 VTDM]: Pytorch lightning trainer config: {'gpus': '7,', 'logger_refresh_rate': 5, 'check_val_every_n_epoch': 1, 'max_epochs': 50, 'accelerator': 'cuda', 'strategy': 'deepspeed_stage_2', 'precision': 16} VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False Initialized embedder #1: AesEmbedder with 343490018 params. Trainable: False Initialized embedder #2: ConcatTimestepEmbedderND with 0 params. Trainable: False Initialized embedder #3: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False Initialized embedder #4: ConcatTimestepEmbedderND with 0 params. Trainable: False Restored from /mnt/afs_intern/yanghaibo/datas/download_checkpoints/svd_checkpoints/stable-video-diffusion-img2vid-xt/svd_xt_image_decoder.safetensors with 312 missing and 0 unexpected keys Missing Keys: ['conditioner.embedders.1.aesthetic_model.positional_embedding', 'conditioner.embedders.1.aesthetic_model.text_projection', 'conditioner.embedders.1.aesthetic_model.logit_scale', 'conditioner.embedders.1.aesthetic_model.visual.class_embedding', 'conditioner.embedders.1.aesthetic_model.visual.positional_embedding', 'conditioner.embedders.1.aesthetic_model.visual.proj', 'conditioner.embedders.1.aesthetic_model.visual.conv1.weight', 'conditioner.embedders.1.aesthetic_model.visual.ln_pre.weight', 'conditioner.embedders.1.aesthetic_model.visual.ln_pre.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.0.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.1.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.2.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.3.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.4.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.5.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.6.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.7.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.8.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.9.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.10.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.11.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.12.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.13.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.14.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.15.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.16.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.17.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.18.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.19.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.20.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.21.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.22.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.attn.in_proj_weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.attn.in_proj_bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.attn.out_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.attn.out_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.ln_1.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.ln_1.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.mlp.c_fc.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.mlp.c_fc.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.mlp.c_proj.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.mlp.c_proj.bias', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.ln_2.weight', 'conditioner.embedders.1.aesthetic_model.visual.transformer.resblocks.23.ln_2.bias', 'conditioner.embedders.1.aesthetic_model.visual.ln_post.weight', 'conditioner.embedders.1.aesthetic_model.visual.ln_post.bias', 'conditioner.embedders.1.aesthetic_model.token_embedding.weight', 'conditioner.embedders.1.aesthetic_model.ln_final.weight', 'conditioner.embedders.1.aesthetic_model.ln_final.bias', 'conditioner.embedders.1.aesthetic_mlp.layers.0.weight', 'conditioner.embedders.1.aesthetic_mlp.layers.0.bias', 'conditioner.embedders.1.aesthetic_mlp.layers.2.weight', 'conditioner.embedders.1.aesthetic_mlp.layers.2.bias', 'conditioner.embedders.1.aesthetic_mlp.layers.4.weight', 'conditioner.embedders.1.aesthetic_mlp.layers.4.bias', 'conditioner.embedders.1.aesthetic_mlp.layers.6.weight', 'conditioner.embedders.1.aesthetic_mlp.layers.6.bias', 'conditioner.embedders.1.aesthetic_mlp.layers.7.weight', 'conditioner.embedders.1.aesthetic_mlp.layers.7.bias'] /mnt/afs_intern/yanghaibo/installed/anaconda3/envs/general/lib/python3.10/site-packages/pytorch_lightning/loggers/test_tube.py:104: LightningDeprecationWarning: The TestTubeLogger is deprecated since v1.5 and will be removed in v1.7. We recommend switching to the `pytorch_lightning.loggers.TensorBoardLogger` as an alternative. rank_zero_deprecation( [09/29 13:21:06 VTDM]: Merged modelckpt-cfg: {'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'outputs/logs/train-v01/2024-09-29T13-20-04_train-v01_00/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_weights_only': True}} [09/29 13:21:06 VTDM]: Caution: Saving checkpoints every n train steps without deleting. This might require some free space. [09/29 13:21:06 VTDM]: Merged trainsteps-cfg: {'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'outputs/logs/train-v01/2024-09-29T13-20-04_train-v01_00/checkpoints/trainstep_checkpoints', 'filename': '{epoch:06}-{step:09}', 'verbose': True, 'save_top_k': -1, 'every_n_train_steps': 3000, 'save_weights_only': False}} [09/29 13:21:06 VTDM]: Done in building trainer kwargs. GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs ============= length of dataset 1 ============= [09/29 13:21:07 VTDM]: Set up dataset. [09/29 13:21:07 VTDM]: accumulate_grad_batches = 1 [09/29 13:21:07 VTDM]: Setting learning rate to 1.00e-05 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 1 (batchsize) * 1.00e-05 (base_lr) /mnt/afs_intern/yanghaibo/installed/anaconda3/envs/general/lib/python3.10/site-packages/pytorch_lightning/trainer/configuration_validator.py:116: UserWarning: You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop. rank_zero_warn("You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.") /mnt/afs_intern/yanghaibo/installed/anaconda3/envs/general/lib/python3.10/site-packages/pytorch_lightning/trainer/configuration_validator.py:271: LightningDeprecationWarning: The `on_keyboard_interrupt` callback hook was deprecated in v1.5 and will be removed in v1.7. Please use the `on_exception` callback hook instead. rank_zero_deprecation( /mnt/afs_intern/yanghaibo/installed/anaconda3/envs/general/lib/python3.10/site-packages/pytorch_lightning/trainer/configuration_validator.py:287: LightningDeprecationWarning: Base `Callback.on_train_batch_end` hook signature has changed in v1.5. The `dataloader_idx` argument will be removed in v1.7. rank_zero_deprecation( Global seed set to 2300 initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1 /mnt/afs_intern/yanghaibo/installed/anaconda3/envs/general/lib/python3.10/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py:625: UserWarning: Inferring the batch size for internal deepspeed logging from the `train_dataloader()`. If you require skipping this, please pass `Trainer(strategy=DeepSpeedPlugin(logging_batch_size_per_gpu=batch_size))` rank_zero_warn( Enabling DeepSpeed FP16. /mnt/afs_intern/yanghaibo/installed/anaconda3/envs/general/lib/python3.10/site-packages/pytorch_lightning/core/datamodule.py:469: LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup. rank_zero_deprecation( LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7] You have not specified an optimizer or scheduler within the DeepSpeed config. Using `configure_optimizers` to define optimizer and scheduler. Project config data: target: sgm.data.video_dataset.VideoDataset params: base_folder: datas/OBJAVERSE-LVIS-example/images eval_folder: validation_set_example width: 512 height: 512 sample_frames: 16 batch_size: 1 num_workers: 1 model: target: vtdm.vtdm_gen_v01.VideoLDM base_learning_rate: 1.0e-05 params: input_key: video scale_factor: 0.18215 log_keys: caption num_samples: 16 trained_param_keys: - all en_and_decode_n_samples_a_time: 16 disable_first_stage_autocast: true ckpt_path: /mnt/afs_intern/yanghaibo/datas/download_checkpoints/svd_checkpoints/stable-video-diffusion-img2vid-xt/svd_xt_image_decoder.safetensors denoiser_config: target: sgm.modules.diffusionmodules.denoiser.Denoiser params: scaling_config: target: sgm.modules.diffusionmodules.denoiser_scaling.VScalingWithEDMcNoise network_config: target: sgm.modules.diffusionmodules.video_model.VideoUNet params: adm_in_channels: 768 num_classes: sequential use_checkpoint: true in_channels: 8 out_channels: 4 model_channels: 320 attention_resolutions: - 4 - 2 - 1 num_res_blocks: 2 channel_mult: - 1 - 2 - 4 - 4 num_head_channels: 64 use_linear_in_transformer: true transformer_depth: 1 context_dim: 1024 spatial_transformer_attn_type: softmax-xformers extra_ff_mix_layer: true use_spatial_context: true merge_strategy: learned_with_images video_kernel_size: - 3 - 1 - 1 conditioner_config: target: sgm.modules.GeneralConditioner params: emb_models: - is_trainable: false input_key: cond_frames_without_noise ucg_rate: 0.1 target: sgm.modules.encoders.modules.FrozenOpenCLIPImagePredictionEmbedder params: n_cond_frames: 1 n_copies: 1 open_clip_embedding_config: target: sgm.modules.encoders.modules.FrozenOpenCLIPImageEmbedder params: version: ckpts/open_clip_pytorch_model.bin freeze: true - is_trainable: false input_key: video ucg_rate: 0.0 target: vtdm.encoders.AesEmbedder - is_trainable: false input_key: elevation target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND params: outdim: 256 - input_key: cond_frames is_trainable: false ucg_rate: 0.1 target: sgm.modules.encoders.modules.VideoPredictionEmbedderWithEncoder params: disable_encoder_autocast: true n_cond_frames: 1 n_copies: 16 is_ae: true encoder_config: target: sgm.models.autoencoder.AutoencoderKLModeOnly params: embed_dim: 4 monitor: val/rec_loss ddconfig: attn_type: vanilla-xformers double_z: true z_channels: 4 resolution: 256 in_channels: 3 out_ch: 3 ch: 128 ch_mult: - 1 - 2 - 4 - 4 num_res_blocks: 2 attn_resolutions: [] dropout: 0.0 lossconfig: target: torch.nn.Identity - input_key: cond_aug is_trainable: false target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND params: outdim: 256 first_stage_config: target: sgm.models.autoencoder.AutoencoderKL params: embed_dim: 4 monitor: val/rec_loss ddconfig: attn_type: vanilla-xformers double_z: true z_channels: 4 resolution: 256 in_channels: 3 out_ch: 3 ch: 128 ch_mult: - 1 - 2 - 4 - 4 num_res_blocks: 2 attn_resolutions: [] dropout: 0.0 lossconfig: target: torch.nn.Identity loss_fn_config: target: sgm.modules.diffusionmodules.loss.StandardDiffusionLoss params: num_frames: 16 batch2model_keys: - num_video_frames - image_only_indicator sigma_sampler_config: target: sgm.modules.diffusionmodules.sigma_sampling.EDMSampling params: p_mean: 1.0 p_std: 1.6 loss_weighting_config: target: sgm.modules.diffusionmodules.loss_weighting.VWeighting sampler_config: target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler params: num_steps: 25 verbose: true discretization_config: target: sgm.modules.diffusionmodules.discretizer.EDMDiscretization params: sigma_max: 700.0 guider_config: target: sgm.modules.diffusionmodules.guiders.LinearPredictionGuider params: num_frames: 16 max_scale: 2.5 min_scale: 1.0 Lightning config trainer: gpus: 7, logger_refresh_rate: 5 check_val_every_n_epoch: 1 max_epochs: 50 accelerator: cuda strategy: deepspeed_stage_2 precision: 16 callbacks: image_logger: target: vtdm.callbacks.ImageLogger params: log_on_batch_idx: true increase_log_steps: false log_first_step: true batch_frequency: 200 max_images: 8 clamp: true log_images_kwargs: 'N': 8 sample: true ucg_keys: - cond_frames - cond_frames_without_noise metrics_over_trainsteps_checkpoint: target: pytorch_lightning.callbacks.ModelCheckpoint params: every_n_train_steps: 3000 save_weights_only: false | Name | Type | Params ------------------------------------------------------------ 0 | model | OpenAIWrapper | 1.5 B 1 | denoiser | Denoiser | 0 2 | conditioner | GeneralConditioner | 1.1 B 3 | first_stage_model | AutoencoderKL | 83.7 M 4 | loss_fn | StandardDiffusionLoss | 0 ------------------------------------------------------------ 1.5 B Trainable params 1.2 B Non-trainable params 2.7 B Total params 5,438.442 Total estimated model params size (MB) /mnt/afs_intern/yanghaibo/installed/anaconda3/envs/general/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:617: UserWarning: Checkpoint directory outputs/logs/train-v01/2024-09-29T13-20-04_train-v01_00/checkpoints exists and is not empty. rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.") [09/29 13:21:13 VTDM]: Epoch: 0, batch_num: inf /mnt/afs_intern/yanghaibo/installed/anaconda3/envs/general/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:56: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`. warning_cache.warn( ############################## Sampling setting ############################## Sampler: EulerEDMSampler Discretization: EDMDiscretization Guider: LinearPredictionGuider Sampling with EulerEDMSampler for 26 steps: 0%| | 0/26 [00:00 loss: 0.025749 [09/29 13:22:48 VTDM]: [Epoch 0] [Batch 10/inf 9.47 s/batch] => loss: 0.034793 Average Epoch time: 94.71 seconds Average Peak memory 43059.05 MiB [09/29 13:23:36 VTDM]: Epoch: 1, batch_num: inf ############################## Sampling setting ############################## Sampler: EulerEDMSampler Discretization: EDMDiscretization Guider: LinearPredictionGuider Sampling with EulerEDMSampler for 26 steps: 96%|██████████████████████████████████████████████████████████████████████████████████████████▍ | 25/26 [00:25<00:01, 1.02s/it] [09/29 13:24:19 VTDM]: [Epoch 1] [Batch 5/inf 8.63 s/batch] => loss: 0.030595 [09/29 13:24:31 VTDM]: [Epoch 1] [Batch 10/inf 5.54 s/batch] => loss: 0.113596 Average Epoch time: 55.36 seconds Average Peak memory 43059.05 MiB [09/29 13:25:18 VTDM]: Epoch: 2, batch_num: inf ############################## Sampling setting ############################## Sampler: EulerEDMSampler Discretization: EDMDiscretization Guider: LinearPredictionGuider Sampling with EulerEDMSampler for 26 steps: 96%|██████████████████████████████████████████████████████████████████████████████████████████▍ | 25/26 [00:25<00:01, 1.02s/it] [09/29 13:26:01 VTDM]: [Epoch 2] [Batch 5/inf 8.58 s/batch] => loss: 0.127297 [09/29 13:26:13 VTDM]: [Epoch 2] [Batch 10/inf 5.51 s/batch] => loss: 0.039552 Average Epoch time: 55.14 seconds Average Peak memory 43059.05 MiB [09/29 13:26:59 VTDM]: Epoch: 3, batch_num: inf ############################## Sampling setting ##############################