Need help with quantizing the original model
#2
by
aaryaman
- opened
Can someone provide instructions on how the original model can be quantized?
I downloaded the model from microsoft/phi-2 and tried to quantize it using the scripts in llama.cpp but got an error only to realize the model is not yet supported on llama.cpp.
Any insights or suggestions would be greatly appreciated.
hi @aaryaman
here are the steps, you need rust on your system
git clone https://github.com/huggingface/candle
cd candle
cargo build
# optional, this is to make sure the original model is `~/.cache/huggingface/hub/models--microsoft--phi-2/`
# run the phi 2 example, so Candle downloads the weight for you
cargo run --example phi --release -- --prompt "USER: What would you do on a sunny day in Paris?\nASSISTANT:" --sample-len 200 --model 2
# then run the quantization
mkdir phi-2-quantized
# q4k
cargo run --example tensor-tools -- quantize ~/.cache/huggingface/hub/models--microsoft--phi-2/snapshots/d3186761bf5c4409f7679359284066c25ab668ee/model-0000*-of-00002.safetensors --quantization q40 --out-file phi-2-quantized/model-v2-q4_1.gguf
# q80
cargo run --example tensor-tools -- quantize ~/.cache/huggingface/hub/models--microsoft--phi-2/snapshots/d3186761bf5c4409f7679359284066c25ab668ee/model-0000*-of-00002.safetensors --quantization q80 --out-file phi-2-quantized/model-v2-q80.gguf