Hardware requirements
#10
by
ZahirHamroune
- opened
Same issue. Cannot seem to figure it out
Do you get the error when loading the model or when running inference?
Same experience here. Couldn't get it running on RTX 4070 with 16 GB of Ram, couldn't also get it going on Colab with 15GB of vRam.
#The default range for the number of visual tokens per image in the model is 4-16384. You can set min_pixels and max_pixels according to your needs, such as a token count range of 256-1280, to balance speed and memory usage.
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels)
Try set max_pixels when init processor. Otherwise it will use ton of vram when dealing with high res images.