radames/Candle-phi1-phi2-wasm-demo · Need help with quantizing the original model

here are the steps, you need rust on your system

git clone https://github.com/huggingface/candle
cd candle
cargo build

# optional, this is to make sure the original model is  `~/.cache/huggingface/hub/models--microsoft--phi-2/`
# run the phi 2 example, so Candle downloads the weight for you
cargo run --example phi --release --  --prompt "USER: What would you do on a sunny day in Paris?\nASSISTANT:" --sample-len 200 --model 2            

# then run the quantization
mkdir phi-2-quantized
# q4k
cargo run --example tensor-tools -- quantize ~/.cache/huggingface/hub/models--microsoft--phi-2/snapshots/d3186761bf5c4409f7679359284066c25ab668ee/model-0000*-of-00002.safetensors --quantization q40 --out-file phi-2-quantized/model-v2-q4_1.gguf

# q80
cargo run --example tensor-tools -- quantize ~/.cache/huggingface/hub/models--microsoft--phi-2/snapshots/d3186761bf5c4409f7679359284066c25ab668ee/model-0000*-of-00002.safetensors --quantization q80 --out-file phi-2-quantized/model-v2-q80.gguf