Can this model be quantized using bitsandbytes or some other method?
#30
by
RonanMcGovern
- opened
I see that gguf quantization is possible, which means running llama.cpp is possible, but is there a way to do the same with transformers or sentence transformers? Thanks