Missing quant_config.json

by ThWu - opened Apr 26

ThWu

Apr 26

•

I'm setting up arctic vllm endpoint following the tutorial https://github.com/Snowflake-Labs/snowflake-arctic/tree/main/inference/vllm. However, I was not able to enable the quantization="deepspeedfp" due to ValueError: Cannot find the config file for deepspeedfp, result in OOM with even 8 A100s
The fix is to add the quant_config.json into the model dir:

    "bits": 8,
    "rounding": "nearest",
    "mantissa_bits": 3,
    "group_size": 512
}```
Could you guys upload it?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment