How is it possible that Q4_K_M performs better than any Q5, Q6 and even Q8?
I'm shocked a bit. I'm working with this model every day. Surprisingly, I discovered the better quantization is the worse answer I get.
My general purpose is the assistance in writing articles. The Q4_K_M copes with every task while any other quantization fails on the same tasks with the same parameters. Can anyone explain it to me?
I don't claim there is an error in quantization process, however, it looks completely impossible that higher quality models perform much worser.
I can't run the models above Q4_K_M on my 8GB MacBook M1, that's why I'm using this model. However, I was able to run the superior quantized versions of this model on CPU on my dedicated server, and all the experiments since the moment of model's release concluded without results.
I'm confused as there should be some explanation. I can't provide here any example as I need thereby to publish big articles containing 1500-1800 words for comparison. But you can check this out by yourself.
Well q4 will not be worse in everything and might be slightly better then higher quantizations. However, higher quantization will be generally better then q4 and almost for everything higher quantization will be better