How to convert original model to q4f16 or q4 for web?

#1
by nickelshh - opened

How to convert original model to q4f16 or q4 for web? It seem the convert using optimium cli + quantize_dynamic for QInt4 does work in onnx-web.

I converted the model with this script that also takes care of the 4bit quantization:
https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py

I used "python -m onnxruntime_genai.models.builder -m ~/models/Phi-3.5-mini-instruct/ -o ./model/ms -p int4 -e web" and seems the inference result is all incorrect compare to the one I download from yours.

Sign up or log in to comment