NeverSleep
/

MiquMaid-v2-70B-GGUF

GGUF

Not-For-All-Audiences

nsfw

Inference Endpoints

conversational

Model card Files Files and versions Community

Any plan of realeasing q2_k_s quants?

by stduhpf - opened Feb 7

Discussion

stduhpf

Feb 7

•

edited Feb 7

I was about to ask this question on the v1 repo, when I found out you are already rolling out the v2.
I can barely run the normal q2_k (m) by completely filling up both my RAM and VRAM. using a q2_k_s would give me some headroom.
(for example, the q2_k_s from Nexesenex's Miqu requants is barely worse than the original Miqu q2_k, and allows me to do something else on my computer while inference is running.)

Kooten

Feb 7

Not speaking for Neversleep but I do have a bunch of imatrix quants running for 70B & 70B-DPO , the first ones should be up within an hour.
V1 versions are available at Kooten/MiquMaid-v1-70B-IQ2-GGUF

Undi95

NeverSleep org Feb 7

Not speaking for Neversleep but I do have a bunch of imatrix quants running for 70B & 70B-DPO , the first ones should be up within an hour.
V1 versions are available at Kooten/MiquMaid-v1-70B-IQ2-GGUF

Thanks for that! stduhpf, you can use his quant, not sure if Q2_K_S will be usable tbh, IQ2 would be the way

stduhpf

Feb 7

Thank you, but sadly IQ2 quants are not working on the Vulkan or openCL Backends of llama.cpp, so I can't offload them to my consumer AMD gpu...

IkariDev

NeverSleep org Feb 8

@Kooten you are a legend as always, thx!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment