Only q8_0 is working. I think maybe the moe merges have this flaw.

#1
by ishanparihar - opened

Only q8_0 is working. I think maybe the moe merges have this flaw. Let me know if anyone have any insights regarding this

Owner
edited Mar 3

I'm personally using the Q5_K_M without any issue inferencing through llama.cpp (TGWUI and Jan).
Which ones did you try and what inference engine did you use?

Owner
edited Mar 3

I just found Q6_K is broken! It only outputs boxes!
eg. "33A

//EMI

2I,IMA8E#?E?E,(Q IQ.22E3A"

But in the current state, at least those 3 are tested and working fine (and this was unlucky as those were the only ones I tested and kept for myself so I never noticed the issue):

  • Q4_K_M
  • Q5_K_S
  • Q5_K_M

Maybe I messed with 2 scripts or went out of storage when converting this specific one. If required, would be happy to try to requantize as I kept the F16 model on my drive
But that would be really helpful that you let me know the broken ones you tried.

Owner

OK, I retested every quants, so really no luck if you only tried the Q6_K before the Q8_0 as this was the only wrong one!
I'm gonna requantize it and reupload if everything goes right this time. I'll let you know ;)

Owner
edited Mar 3

Q6_K requantized and working as intended this time!
So far, the only one I didn't test is the Q8_0. But with your feedback on this one, we've now covered it all :)

Thanks for pointing out the issue!

I'm closing this.

owao changed discussion status to closed

Sign up or log in to comment