Phi-3-small-8k-instruct-onnx

#19

by internetUser - opened Jun 3

Jun 3

There is onnx-cpu version for Phi-3-mini and Phi-3-medium but not for Phi-3-small. This model fits perfectly for cpu usage. If possible, could you provide Phi-3-small-8k-instruct-onnx-cpu and if not, could you indicate how to convert this model to onnx.

kvaishnavi

Microsoft org Jun 5

As mentioned here, the SparseAttention operator has a kernel implementation only for CUDA currently. A kernel implementation for SparseAttention on CPU is in progress. Once it is complete, we will publish optimized and quantized ONNX models for Phi-3 small that run on CPU.

kvaishnavi changed discussion status to closed Jun 5

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment