Quantized GGUF / EXL2 please?
#1
by
siddhesh22
- opened
Possible to quantize this? Would appreciate it, only have 12GB VRAM.
This model was trained and validated under 4-bit quantization with bitsandbytes of which parametere has double quantization and nf4 format.
Actually, I did not have an experience to quantize this model by GGUF and EXL2 but only by bitsandbytes, but I think 12GB VRAM is too insufficient memory to my coding experience despite 4bit, because we must have additional four computer vision models.
Therefore, I think it may be impossible to run under only 12GB VRAM.