[Error] Error when executing the example code

#3
by StarCycle - opened

Hi,

If I run the model with example code within the folder, I get this error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-fd9293295145> in <cell line: 2>()
      1 import torch
----> 2 from modeling_siglip import SiglipVisionModel
      3 
      4 DEVICE = torch.device("cuda:0")
      5 PATCH_SIZE = 14

/content/siglip-so400m-14-980-flash-attn2-navit/modeling_siglip.py in <module>
     40     replace_return_docstrings,
     41 )
---> 42 from .configuration_siglip import SiglipConfig, SiglipTextConfig, SiglipVisionConfig
     43 
     44 

ImportError: attempted relative import with no known parent package

If I replace the source code files in transformers (e.g., modeling_siglip.py) with the source code files in this repo, I get this error:

4c7619335ec2d1e093954af4a2d4778.png

Actually there is the argument:

bf8fd05b0d7883b1c2047e81bc90e77.png

If I run the code with:

import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-384-flash-attn2", trust_remote_code=True)
model.eval().cuda().half()

pixel_values = torch.randn(1, 3, 384, 384).cuda().half()
output= model.vision_model(pixel_values)

It does work. But the model only accepts images with 384*384 resolution. If I send an image with 512*512 resolution, I will get a dimension mismatch error from the position embedding.

Could you please modify the example code so it can be executed? How to run the model successfully with Google Colab?

Hi @StarCycle
Have you tried a model = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit"); model.vision_model?

Thanks @VictorSanh !

It should be OK with

import torch
from transformers import AutoModel

pixel_values = torch.randn(1, 3, 224, 384).cuda().half() # any resolution here
model = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit", trust_remote_code=True)
model.eval().cuda().half()
output= model.vision_model(pixel_values)

Is it necessary to specify the patch_attention_mask in the example of README?

Sign up or log in to comment