GPTQ plz
GPTQ plz.
It is a big model, I can see why that'd be a good idea.
+1
Hi all,
If
@puffy310
hasn't started, I can give it a shot. (assuming DeepseekV2ForCausalLM
is supported by now in AutoGPTQ)
Try the vLLM version first, as the model devs have said the Huggingface implementation isn't up to their standards anyways. "Everyone wants a quantized model but nobody wants to quantize a model". - Julian Herrera
I'll see if I can give it a try but I doubt I have the know how. DeepseekV2 was just released and I don't know if AutoGPTQ works well with MoE architectures. If I have some time today I might as well try but your implementation will most likely be better. I always love to learn though. I'll write progress in this discussion.
+1
+1
Just for a reference: https://github.com/AutoGPTQ/AutoGPTQ/issues/664
Seems not feasible in AutoAWQ as well: https://github.com/casper-hansen/AutoAWQ/issues/473
I try building the model by awq. It takes a long time to rebulid the model.
Just for a reference: https://github.com/AutoGPTQ/AutoGPTQ/issues/664
Seems not feasible in AutoAWQ as well: https://github.com/casper-hansen/AutoAWQ/issues/473
@MaziyarPanahi
AutoAWQ and GPTQModel support this model