GGUF
Not-For-All-Audiences
nsfw
Inference Endpoints
conversational

Q2_XS / Q2_XSS & Q3_XXS are currently unusable on ooba [WIP]

#1
by Undi95 - opened
NeverSleep org

I will close this when it will be fixed, sorry for that!

NeverSleep org

Can confirm no problems with latest llama.cpp and koboldcpp, issue seems to be with ooba.

Undi95 changed discussion title from Q2_XS / Q2_XSS & Q3_XXS are currently unusable [WIP] to Q2_XS / Q2_XSS & Q3_XXS are currently unusable on ooba [WIP]

Also unusable on the current lm-studio (v0.2.14).

I tried the IQ2xs, all I get is garbage output on KoboldCPP, both in the base client and Silly Tavern. I am using Nexenex's latest build of KoboldCPP, which adds the IQ functionality and quadratic sampling. Other Miqu models work fine on that.

Gave the 1x70b Q4km version of MiquMaidv2 DPO a try, to see if it works. It does.

@alpindale , which IQ are you using?

NeverSleep org
edited Feb 8

I confirm that the XS/XXS quant only work on Kobold.cpp on my side atm.
I'm sorry for Ooba/lm-studio and other LLM back end users, but atm, they need to update their llama.cpp themself before, it's out of our reach.

NeverSleep IQs don't work for me. I get garbage. Goes for Kooten's IQ of 70b MiquMaidv2 DPO, as well. Nexesenex's version of Q2xss seems to work. Gotta go, got an appointment, so I can't test more at the moment.

▅ffect extremelyOU Switzerland фі DRchar memb foreign courtsdeploy vb constructorirus fullygov

NeverSleep org

This IQ2_XS from this repo work for me on Koboldcpp tho, that's weird

MiquMaid-v2-2x70B-DPO.IQ2_XXS.gguf and MiquMaid-v2-2x70B-DPO.IQ2_XS.gguf are broken for me too on llama.cpp (build 2144).

When compiled with LLAMA_DEBUG it fails with main: ggml.c:11508: ggml_compute_forward_soft_max_f32: Assertion `!isnan(wp[i])' failed.

This is similar to the issue I experienced with ikawrakow/mixtral-instruct-8x7b-quantized-gguf.

Sign up or log in to comment