petermcaughan
commited on
Commit
•
92d47fb
1
Parent(s):
7353d4f
Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ See the [usage instructions](#usage-example) for how to inference this model wit
|
|
30 |
|
31 |
#### Latency for token generation
|
32 |
|
33 |
-
Below is average latency of generating a token using a prompt of varying size using NVIDIA A100-SXM4-80GB GPU
|
34 |
|
35 |
| Prompt Length | Batch Size | PyTorch 2.1 torch.compile | ONNX Runtime CUDA |
|
36 |
|-------------|------------|----------------|-------------------|
|
|
|
30 |
|
31 |
#### Latency for token generation
|
32 |
|
33 |
+
Below is average latency of generating a token using a prompt of varying size using NVIDIA A100-SXM4-80GB GPU, taken from the [ORT benchmarking script for Mistral](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/llama/README.md#benchmark-mistral)
|
34 |
|
35 |
| Prompt Length | Batch Size | PyTorch 2.1 torch.compile | ONNX Runtime CUDA |
|
36 |
|-------------|------------|----------------|-------------------|
|