out of memory
#3
by
LiMuyi
- opened
My GPU is NVIDIA GeForce RTX 4090 with 24GB, but when I load this model, it's out of memory. I already set cache_8bit==True and use Exllamav2_HF as loader. (My GPU is idle and no other models is running.)
Hi I'm not getting OOMs but sometimes responses are slow, because this quant uses also shared GPU memory. For now, you could try to reduce context size to 2k, soon I'l upload 2.8 bpw quant - that one fits perfectly on my 3090.