GGUF Quantisations

#3
by smcleod - opened

I've had a crack (for the first time) at quantising models with Q3_K_M and Q4_K_M GGUF variants if anyone finds them useful, I've also pushed these to Ollama's model registry.

Disclaimer - I literally read how to quantise models yesterday so while I think it went to plan - please do let me know if there are any issues!

It doesn't seem to be working. The output seems to be garbage. Did it work for you?

deleted
This comment has been hidden

I added some more quants: https://huggingface.co/gobean/Smaug-Mixtral-v0.1-GGUF, using llama.cpp from 4/18/2024.

Mixtral Instruct worked better for me with qx_0 for me so I used those - unsure why qx_k_y behave differently. Output tests on few shot seem good, q4_0 is fast enough for regular use with ~18gb vram usage.

smcleod changed discussion status to closed

Sign up or log in to comment