Smaller quants

#1
by saishf - opened

May we have a IQ4_NL/Q4_K_S? It fits perfectly into 8GB of vram with 16K ctx & Q4 cache ~ 7.6GB of vram utilization

Of course - uploading the Q4_K_S now. Should be ready shortly.

Of course - uploading the Q4_K_S now. Should be ready shortly.

Thank you!

No problem, Q4_K_S is up. Adding IQ4_NL and IQ4_XS too as those should be a little bit smaller in case you're trying to lower VRAM utilization.

Sign up or log in to comment