GGUF
Not-For-All-Audiences
nsfw
Inference Endpoints
conversational

Any plan of realeasing q2_k_s quants?

#1
by stduhpf - opened

I was about to ask this question on the v1 repo, when I found out you are already rolling out the v2.
I can barely run the normal q2_k (m) by completely filling up both my RAM and VRAM. using a q2_k_s would give me some headroom.
(for example, the q2_k_s from Nexesenex's Miqu requants is barely worse than the original Miqu q2_k, and allows me to do something else on my computer while inference is running.)

Not speaking for Neversleep but I do have a bunch of imatrix quants running for 70B & 70B-DPO , the first ones should be up within an hour.
V1 versions are available at Kooten/MiquMaid-v1-70B-IQ2-GGUF

NeverSleep org

Not speaking for Neversleep but I do have a bunch of imatrix quants running for 70B & 70B-DPO , the first ones should be up within an hour.
V1 versions are available at Kooten/MiquMaid-v1-70B-IQ2-GGUF

Thanks for that! stduhpf, you can use his quant, not sure if Q2_K_S will be usable tbh, IQ2 would be the way

Thank you, but sadly IQ2 quants are not working on the Vulkan or openCL Backends of llama.cpp, so I can't offload them to my consumer AMD gpu...

NeverSleep org

@Kooten you are a legend as always, thx!

Sign up or log in to comment