(modelopt) PS E:\ModelOpt_Windows_Scripts_2\modelopt-windows-scripts\ONNX_PTQ> python quantize_script.py --model_name=nvidia/Nemotron-Mini-4B-Instruct --onnx_path=E:\model_store\genai\nemotron-mini-4b-instruct-fp16-dml-genai\opset_21\model.onnx --output_path="E:\model_store\genai\nemotron-mini-4b-instruct-fp16-dml-genai\opset_21\default_quant_dml_ep_calib\model.onnx" --Quantize-Script-- algo=awq_lite, dataset=cnn, calib_size=32, batch_size=1, block_size=128, add-position-ids=True, past-kv=True, rcalib=False, device=cpu, use_zero_point=False --Quantize-Script-- awqlite_alpha_step=0.1, awqlite_fuse_nodes=False, awqlite_run_per_subgraph=False, awqclip_alpha_step=0.05, awqclip_alpha_min=0.5, awqclip_bsz_col=1024, calibration_eps=['dml'] C:\Users\vrl\miniconda3\envs\modelopt\Lib\site-packages\transformers\models\auto\configuration_auto.py:1002: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead. warnings.warn( C:\Users\vrl\miniconda3\envs\modelopt\Lib\site-packages\transformers\models\auto\tokenization_auto.py:809: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead. warnings.warn( --Quantize-Script-- number_of_batched_samples=32, batch-input-ids-list-len=32, batched_attention_mask=32 --Quantize-Script-- number of batched inputs = 32 INFO:root: Quantizing the model.... INFO:root:Quantization Mode: int4 INFO:root:Finding quantizable weights and augmenting graph output with input activations INFO:root:Augmenting took 0.03900003433227539 seconds INFO:root:Saving the model took 35.37520098686218 seconds 2024-11-05 06:08:38.8247274 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-11-05 06:08:38.8385074 [W:onnxruntime:, session_state.cc:1170 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. Getting activation names maps...: 100%|██████████████████████████████████████████████████████| 192/192 [00:00