Don't need 200k?
#1
by
bdambrosio
- opened
I would love to run this model in 8bit, and I don't need more than 16k 'real' context.
but exllamav2 says not enough memory (2x4090 = 48GB).
Is there a mod I can make to config.json, or to exllamav2 config, that would allow me to load this?
6 bit, or even 6.5, seems to have lots of vram left over, so not sure why 8 bit won't load.
Any ideas? Tnx!
Just set your context in ooba to reduce the max tokens. Or you can edit cnofig.yaml and change this line:"max_position_embeddings": 200000,
I don't think there is much difference in quality going to 6.0bpw. But, if you don't need full context, 8.0bpw and dropping max tokens should be possible.