Observation
I've noticed a surprisingly big difference in quality between "Cydonia-22B-v1-Q4_K_M.gguf" and "Cydonia-22B-v1-Q6_K.gguf". Usually I see a subtle difference in other models, here it's really big. "Cydonia-22B-v1-Q5_K_M.gguf" fits perfectly between these models (which makes perfect sense).
I am writing this because in the initial tests of "Cydonia-22B-v1-Q4_K_M.gguf" I thought that this model is quite weak and if I had not tested Q5 and Q6 I would have remained with this opinion. So if someone is disappointed with the Q4 version, they should try higher ones.
Overall, it came out really nice. Unfortunately, I can't run the Q6 version with a context of a reasonable size... maybe someday. Thanks for the model though.
Thank you!
Have you tried the quants from bartowski? https://huggingface.co/bartowski/Cydonia-22B-v1-GGUF
Or the imatrixes? https://huggingface.co/MarsupialAI/Cydonia-22B-v1_iMat_GGUF
Speaking from my pov with only 12GB, I had better results with the IQ3_XS (from bartowski) than the Q4KM for some reasons. The fact that one can fit fully with 16K context (and 8bit cache compression), and the other not, might color my perception. I had to tune down the XTC sampler a lot to get good results.
I tried the quants from bartowski, his L versions are somehow slightly better than the usual ones.
SerialKicked: iMat? I'll see iMat. Thanks.
"Cydonia-22B-v1_iQ4nl.gguf" from https://huggingface.co/MarsupialAI/Cydonia-22B-v1_iMat_GGUF seems to be doing a bit better than "Q5km" in tests, and in the story is more consistent (although the quality of prose is sometimes worse)...
On average the "iQ4nl" is better than the "Q5km", but also slightly more chaotic in individual trials... none the less very interesting.
Damn, I'll have to take note. Maybe prioritize iMatrix, especially for my mistral models...