GGUF quants?

by vladfaust - opened Aug 21

Aug 21

•

Blocked by https://github.com/ggerganov/llama.cpp/issues/7439; to be precise, llama.cpp currently lacks blocksparse attention support.

Edit: false alarm, bartowski's quants work good with context size of 8192.

vladfaust changed discussion status to closed Aug 21

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment