Q6_K is slower than Q8_0!

#17
by siddhesh22 - opened

speed_compare.png

My GPU is RTX 3060 12GB VRAM, 32 GB system RAM, i5 12400F

Owner

This is very much expected. Q8_0 is a simple "legacy" quant that only does a few calculations while Q6_K is a lot more complicated, requiring the GPU to do more work.
Hopefully there will be speed improvements to both in the future.

city96 changed discussion status to closed

Sign up or log in to comment