Q2_XS / Q2_XSS & Q3_XXS are currently unusable on ooba [WIP]
I will close this when it will be fixed, sorry for that!
Can confirm no problems with latest llama.cpp and koboldcpp, issue seems to be with ooba.
Also unusable on the current lm-studio (v0.2.14).
I tried the IQ2xs, all I get is garbage output on KoboldCPP, both in the base client and Silly Tavern. I am using Nexenex's latest build of KoboldCPP, which adds the IQ functionality and quadratic sampling. Other Miqu models work fine on that.
Gave the 1x70b Q4km version of MiquMaidv2 DPO a try, to see if it works. It does.
@alpindale , which IQ are you using?
I confirm that the XS/XXS quant only work on Kobold.cpp on my side atm.
I'm sorry for Ooba/lm-studio and other LLM back end users, but atm, they need to update their llama.cpp themself before, it's out of our reach.
NeverSleep IQs don't work for me. I get garbage. Goes for Kooten's IQ of 70b MiquMaidv2 DPO, as well. Nexesenex's version of Q2xss seems to work. Gotta go, got an appointment, so I can't test more at the moment.
▅ffect extremelyOU Switzerland фі DRchar memb foreign courtsdeploy vb constructorirus fullygov
This IQ2_XS from this repo work for me on Koboldcpp tho, that's weird
MiquMaid-v2-2x70B-DPO.IQ2_XXS.gguf
and MiquMaid-v2-2x70B-DPO.IQ2_XS.gguf
are broken for me too on llama.cpp (build 2144).
When compiled with LLAMA_DEBUG
it fails with main: ggml.c:11508: ggml_compute_forward_soft_max_f32: Assertion `!isnan(wp[i])' failed.
This is similar to the issue I experienced with ikawrakow/mixtral-instruct-8x7b-quantized-gguf
.