Smaller quants
#1
by
saishf
- opened
May we have a IQ4_NL/Q4_K_S? It fits perfectly into 8GB of vram with 16K ctx & Q4 cache ~ 7.6GB of vram utilization
Of course - uploading the Q4_K_S now. Should be ready shortly.
Of course - uploading the Q4_K_S now. Should be ready shortly.
Thank you!
No problem, Q4_K_S is up. Adding IQ4_NL and IQ4_XS too as those should be a little bit smaller in case you're trying to lower VRAM utilization.