Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ This model is the ONNX version of [https://huggingface.co/SamLowe/roberta-base-g
|
|
27 |
- is faster in inference than normal Transformers, particularly for smaller batch sizes
|
28 |
- in my tests about 2x to 3x as fast for a batch size of 1 on a 8 core 11th gen i7 CPU using ONNXRuntime
|
29 |
|
30 |
-
###
|
31 |
|
32 |
`onnx/model_quantized.onnx` is the int8 quantized version
|
33 |
|
|
|
27 |
- is faster in inference than normal Transformers, particularly for smaller batch sizes
|
28 |
- in my tests about 2x to 3x as fast for a batch size of 1 on a 8 core 11th gen i7 CPU using ONNXRuntime
|
29 |
|
30 |
+
### Quantized (INT8) ONNX version
|
31 |
|
32 |
`onnx/model_quantized.onnx` is the int8 quantized version
|
33 |
|