According to the LLM Compressor docs the save_compressed=True flag should be present as shown in this example from them: https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_int8

I believe this is an issue with https://huggingface.co/neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 also.

Neural Magic org

This is set to True by default.

got it, my bad

shariqmobin changed pull request status to closed

Sign up or log in to comment