GGUF quants?
#2
by
vladfaust
- opened
Blocked by https://github.com/ggerganov/llama.cpp/issues/7439; to be precise, llama.cpp currently lacks blocksparse attention support.
Edit: false alarm, bartowski's quants work good with context size of 8192.
vladfaust
changed discussion status to
closed