microsoft
/

Mistral-7B-v0.1-onnx

Model card Files Files and versions Community

petermcaughan commited on Dec 11, 2023

Commit

92d47fb

•

1 Parent(s): 7353d4f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -30,7 +30,7 @@ See the [usage instructions](#usage-example) for how to inference this model wit
 #### Latency for token generation
-Below is average latency of generating a token using a prompt of varying size using NVIDIA A100-SXM4-80GB GPU:
 | Prompt Length      | Batch Size | PyTorch 2.1 torch.compile    | ONNX Runtime CUDA |
 |-------------|------------|----------------|-------------------|

 #### Latency for token generation
+Below is average latency of generating a token using a prompt of varying size using NVIDIA A100-SXM4-80GB GPU, taken from the [ORT benchmarking script for Mistral](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/llama/README.md#benchmark-mistral)
 | Prompt Length      | Batch Size | PyTorch 2.1 torch.compile    | ONNX Runtime CUDA |
 |-------------|------------|----------------|-------------------|