shariqmobin
commited on
Commit
•
d6f7c27
1
Parent(s):
e6a4ff7
Update README.md
Browse filesAccording to the LLM Compressor docs the save_compressed=True flag should be present as shown in this example from them: https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_int8
I believe this is an issue with https://huggingface.co/neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 also.
README.md
CHANGED
@@ -126,7 +126,7 @@ oneshot(
|
|
126 |
num_calibration_samples=num_samples,
|
127 |
)
|
128 |
|
129 |
-
model.save_pretrained("Meta-Llama-3.1-8B-Instruct-quantized.w8a8")
|
130 |
```
|
131 |
|
132 |
|
|
|
126 |
num_calibration_samples=num_samples,
|
127 |
)
|
128 |
|
129 |
+
model.save_pretrained("Meta-Llama-3.1-8B-Instruct-quantized.w8a8", save_compressed=True))
|
130 |
```
|
131 |
|
132 |
|