Issue about the checkpoint file.
python xclip.py --train False --task "detection" --load_pre_trained_model_state "../models/invid_xclip_i2v_i2v_best_model.pth" --fake_videos_path "../datasets/fake" --real_videos_path "../datasets/real" --label_number 2
When I run this command, I get the following errors.
Traceback (most recent call last):
File "D:\Lily\Projects\MMVGM\detection_and_source_tracing\xclip.py", line 277, in <module>
main()
File "D:\Lily\Projects\MMVGM\detection_and_source_tracing\xclip.py", line 265, in main
video_cls.load_state_dict(torch.load(args.load_pre_trained_model_state))
File "C:\Users\22844\miniforge3\envs\nvss\lib\site-packages\torch\nn\modules\module.py", line 2189, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VideoClassifier:
size mismatch for classifier.out_layer.weight: copying a param with shape torch.Size([9, 32]) from checkpoint, the shape in current model is torch.Size([2, 32]).
size mismatch for classifier.out_layer.bias: copying a param with shape torch.Size([9]) from checkpoint, the shape in current model is torch.Size([2]).
I checked the SHA256 and it is C1AB5A20B025A90155B9D1B4D518A44A1E8F74BBBC33364E5DF07DD800478912, same as the file on the huggingface.
Hi, I think the problem is caused by setting num_class in xclip.py to 9 instead of 2.
In fact, I tried to set the label_number (the num_classes) to 9 and run successfully (both invid_xclip_i2v_i2v_best_model.pth
(for detection) and invid_xclip_st_best_model.pth
(for source tracking), which is a bit strange), but this model used for detection output multiple category values instead of the expected true or false?
Thanks for your comments. I think that might be because we placed the source tracing model in the detection group, which caused the problem. To better fit your work, you can use the MAE model for detection and source tracing. For the detection model, it is expected to output only 0 and 1 as output.