out of memory

by LiMuyi - opened Jul 26

Jul 26

My GPU is NVIDIA GeForce RTX 4090 with 24GB, but when I load this model, it's out of memory. I already set cache_8bit==True and use Exllamav2_HF as loader. (My GPU is idle and no other models is running.)

TeeZee

Owner Aug 13

Hi I'm not getting OOMs but sometimes responses are slow, because this quant uses also shared GPU memory. For now, you could try to reduce context size to 2k, soon I'l upload 2.8 bpw quant - that one fits perfectly on my 3090.

TeeZee

Owner Aug 14

@LiMuyi try this quant: https://huggingface.co/TeeZee/2xbagel-dpo-34b-v0.2-bpw2.8-h6-exl2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment