GGUF quants?

#2
by vladfaust - opened

Blocked by https://github.com/ggerganov/llama.cpp/issues/7439; to be precise, llama.cpp currently lacks blocksparse attention support.

Edit: false alarm, bartowski's quants work good with context size of 8192.

vladfaust changed discussion status to closed

Sign up or log in to comment