New Exciting quant method

by Yhyu13 - opened Dec 16, 2023

Discussion

Yhyu13

Dec 16, 2023

•

edited Dec 16, 2023

Hi,

@TheBloke this amazing method seems to be fast in generating the quantized model (claimed to be 50x faster than generating GPTQ for llama2 70b) with NO calibration data required. You should pay attention to it

PS
https://mobiusml.github.io/hqq_blog/
https://github.com/oobabooga/text-generation-webui/pull/4888

mobicham

Mobius Labs GmbH org Dec 16, 2023

Thanks a lot @Yhyu13 ! We are gonna publish a new 2-bit Mixtral quantized model that is much better than this one very soon !

mobicham

Mobius Labs GmbH org Dec 18, 2023

The new models are accessible now:
Base: https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ
Instruct: https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment