k-quants?
#1
by
speedorama
- opened
After llama.cpp PR #2148 (https://github.com/ggerganov/llama.cpp/pull/2148), k-quants should be possible on this model.
Yes they are. I'm making k-quants with 32001-vocab models now, like WizardLM 13B v1.1. I've just not gone back to add them to already uploaded models.
I'll try and do it soon.
I've tried making k-quants of my own using the f16 version of this model you provided, but I find that trying to quantize it to 5_K_S causes it to quickly become incoherent.