microsoft
/

Phi-3-medium-128k-instruct-onnx-directml

@@ -34,9 +34,9 @@ How do you know which is the best ONNX model for you:
     - I don't know → Review this [guide](https://www.microsoft.com/en-us/windows/learning-center/how-to-check-gpu) to see whether you have a GPU in your Windows machine.
     - Yes → Access the Hugging Face DirectML ONNX models and instructions at [Phi-3-medium-128k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml).
     - No → Do you have a NVIDIA GPU?
-- I don't know → Review this [guide](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#verify-you-have-a-cuda-capable-gpu) to see whether you have a CUDA-capable GPU.
-    - Yes → Access the Hugging Face CUDA ONNX models and instructions at [Phi-3-medium-128k-instruct-onnx-cuda](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda) for NVIDIA GPUs.
-    - No → Access the Hugging Face ONNX models for CPU devices and instructions at [Phi-3-medium-128k-instruct-onnx-cpu](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu)
 ## How to Get Started with the Model
 To support the Phi-3 models across a range of devices, platforms, and EP backends, we introduce a new API to wrap several aspects of generative AI inferencing. This API makes it easy to drag and drop LLMs straight into your app. To run the early version of these models with ONNX, follow the steps [here](http://aka.ms/generate-tutorial). You can also test this with a [chat app](https://github.com/microsoft/onnxruntime-genai/tree/main/examples/chat_app).
@@ -72,7 +72,7 @@ We measured the performance of DirectML and ONNX Runtime's new Generate() API wi
 Stay tuned for additional performance improvements in the coming weeks thanks to optimized drivers from our hardware partners, along with additional updates to the ONNX Runtime Generate() API.
 | Batch Size, Prompt Length | Block Size = 32 |	Block Size = 128 |
-|---------------------------|-----------------|------------------|
 | 1, 16 | 66.60 | 72.26 |

     - I don't know → Review this [guide](https://www.microsoft.com/en-us/windows/learning-center/how-to-check-gpu) to see whether you have a GPU in your Windows machine.
     - Yes → Access the Hugging Face DirectML ONNX models and instructions at [Phi-3-medium-128k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml).
     - No → Do you have a NVIDIA GPU?
+        - I don't know → Review this [guide](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#verify-you-have-a-cuda-capable-gpu) to see whether you have a CUDA-capable GPU.
+        - Yes → Access the Hugging Face CUDA ONNX models and instructions at [Phi-3-medium-128k-instruct-onnx-cuda](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda) for NVIDIA GPUs.
+        - No → Access the Hugging Face ONNX models for CPU devices and instructions at [Phi-3-medium-128k-instruct-onnx-cpu](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cpu)
 ## How to Get Started with the Model
 To support the Phi-3 models across a range of devices, platforms, and EP backends, we introduce a new API to wrap several aspects of generative AI inferencing. This API makes it easy to drag and drop LLMs straight into your app. To run the early version of these models with ONNX, follow the steps [here](http://aka.ms/generate-tutorial). You can also test this with a [chat app](https://github.com/microsoft/onnxruntime-genai/tree/main/examples/chat_app).
 Stay tuned for additional performance improvements in the coming weeks thanks to optimized drivers from our hardware partners, along with additional updates to the ONNX Runtime Generate() API.
 | Batch Size, Prompt Length | Block Size = 32 |	Block Size = 128 |
+|---------------------------|-----------------|------------------|
 | 1, 16 | 66.60 | 72.26 |