Missing quant_config.json
#8
by
ThWu
- opened
I'm setting up arctic vllm endpoint following the tutorial https://github.com/Snowflake-Labs/snowflake-arctic/tree/main/inference/vllm. However, I was not able to enable the quantization="deepspeedfp"
due to ValueError: Cannot find the config file for deepspeedfp
, result in OOM with even 8 A100s
The fix is to add the quant_config.json
into the model dir:
"bits": 8,
"rounding": "nearest",
"mantissa_bits": 3,
"group_size": 512
}```
Could you guys upload it?