New Exciting quant method
#3
by
Yhyu13
- opened
Hi,
@TheBloke this amazing method seems to be fast in generating the quantized model (claimed to be 50x faster than generating GPTQ for llama2 70b) with NO calibration data required. You should pay attention to it
PS
https://mobiusml.github.io/hqq_blog/
https://github.com/oobabooga/text-generation-webui/pull/4888
The new models are accessible now:
Base: https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ
Instruct: https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ