long response times
#27
by
FantasticMrCat42
- opened
What is the best way to lower response time from the model? currently I am ruining this on an laptop with a RTX 4080 so I dont have the 24 gigs of vram. i have used "torch_dtype=torch.float16" to even run any inference leaving me with generation times of over a minute. will lowering image resolution help?
@FantasticMrCat42 You can try 4-bit quantization on a free-tier google colab: https://t.co/u4AMLbZuAU