Update README.md
#2
by
shariqmobin
- opened
According to the LLM Compressor docs the save_compressed=True flag should be present as shown in this example from them: https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_int8
I believe this is an issue with https://huggingface.co/neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 also.
This is set to True by default.
got it, my bad
shariqmobin
changed pull request status to
closed