Update README.md

According to the LLM Compressor docs the save_compressed=True flag should be present as shown in this example from them: https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_int8

I believe this is an issue with https://huggingface.co/neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 also.

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -126,7 +126,7 @@ oneshot(
   num_calibration_samples=num_samples,
 )
-model.save_pretrained("Meta-Llama-3.1-8B-Instruct-quantized.w8a8")
 ```

   num_calibration_samples=num_samples,
 )
+model.save_pretrained("Meta-Llama-3.1-8B-Instruct-quantized.w8a8", save_compressed=True))
 ```