Create GGUF for this please
Create GGUF for this please.
@LoneStriker Can you quantify the exl2 model?
@MaziyarPanahi
@LoneStriker
Can you confirm the funtioning of these quants at your end?
Whenever I tested them, Q6_k and Q8, they spoke all gibberish.
Only the Q4k_m by original author works at the moment for me.
Thanks for your efforts to the open source community. π
@MaziyarPanahi @LoneStriker
Can you confirm the funtioning of these quants at your end?Whenever I tested them, Q6_k and Q8, they spoke all gibberish.
Only the Q4k_m by original author works at the moment for me.
Thanks for your efforts to the open source community. π
Hi
@ishanparihar
There are a lot of new changes introduced recently to MoE models by Llama.cpp. I think it's best if I do a new one with a new build and just test this again
Thanks @MaziyarPanahi for your prompt response and your intention to serve. Looking forward to the new builds. β
My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5_k or q6_k.
Result: both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16.