Q6_K is slower than Q8_0!
#17
by
siddhesh22
- opened
This is very much expected. Q8_0 is a simple "legacy" quant that only does a few calculations while Q6_K is a lot more complicated, requiring the GPU to do more work.
Hopefully there will be speed improvements to both in the future.
city96
changed discussion status to
closed