GGUF Quantisations

by smcleod - opened Feb 21

Feb 21

•

I've had a crack (for the first time) at quantising models with Q3_K_M and Q4_K_M GGUF variants if anyone finds them useful, I've also pushed these to Ollama's model registry.

Disclaimer - I literally read how to quantise models yesterday so while I think it went to plan - please do let me know if there are any issues!

fermuch

Mar 13

It doesn't seem to be working. The output seems to be garbage. Did it work for you?

deleted

Mar 14

This comment has been hidden

gobean

May 5

I added some more quants: https://huggingface.co/gobean/Smaug-Mixtral-v0.1-GGUF, using llama.cpp from 4/18/2024.

Mixtral Instruct worked better for me with qx_0 for me so I used those - unsure why qx_k_y behave differently. Output tests on few shot seem good, q4_0 is fast enough for regular use with ~18gb vram usage.

smcleod changed discussion status to closed May 13

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment