Smaller quants

by saishf - opened Jul 24

saishf

Jul 24

May we have a IQ4_NL/Q4_K_S? It fits perfectly into 8GB of vram with 16K ctx & Q4 cache ~ 7.6GB of vram utilization

Owner Jul 24

Of course - uploading the Q4_K_S now. Should be ready shortly.

saishf

Jul 24

Of course - uploading the Q4_K_S now. Should be ready shortly.

Thank you!

Owner Jul 24

No problem, Q4_K_S is up. Adding IQ4_NL and IQ4_XS too as those should be a little bit smaller in case you're trying to lower VRAM utilization.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment