Phi-3-small-8k-instruct-onnx
#19
by
internetUser
- opened
There is onnx-cpu version for Phi-3-mini and Phi-3-medium but not for Phi-3-small. This model fits perfectly for cpu usage. If possible, could you provide Phi-3-small-8k-instruct-onnx-cpu and if not, could you indicate how to convert this model to onnx.
As mentioned here, the SparseAttention
operator has a kernel implementation only for CUDA currently. A kernel implementation for SparseAttention
on CPU is in progress. Once it is complete, we will publish optimized and quantized ONNX models for Phi-3 small that run on CPU.
kvaishnavi
changed discussion status to
closed