Don't need 200k?

by bdambrosio - opened Jan 9

Jan 9

I would love to run this model in 8bit, and I don't need more than 16k 'real' context.
but exllamav2 says not enough memory (2x4090 = 48GB).
Is there a mod I can make to config.json, or to exllamav2 config, that would allow me to load this?
6 bit, or even 6.5, seems to have lots of vram left over, so not sure why 8 bit won't load.
Any ideas? Tnx!

LoneStriker

Owner Jan 10

Just set your context in ooba to reduce the max tokens. Or you can edit cnofig.yaml and change this line:
"max_position_embeddings": 200000,

I don't think there is much difference in quality going to 6.0bpw. But, if you don't need full context, 8.0bpw and dropping max tokens should be possible.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment