task: token-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.g4dn.2xlarge', 'supported_instructions': None}
Number of evaluation samples: All dataset
Fixed parameters:
- model_name_or_path:
elastic/distilbert-base-uncased-finetuned-conll03-english
- dataset:
- path:
conll2003
- eval_split:
validation
- data_keys:
{'primary': 'tokens'}
- ref_keys:
['ner_tags']
- calibration_split:
train
- path:
- per_channel:
False
- calibration:
- method:
minmax
- num_calibration_samples:
100
- method:
- framework:
onnxruntime
- framework_args:
- opset:
11
- optimization_level:
1
- opset:
- aware_training:
False
Benchmarked parameters:
- quantization_approach:
dynamic
,static
- operators_to_quantize:
['Add']
,['Add', 'MatMul']
- node_exclusion:
[]
,['layernorm', 'gelu', 'residual', 'gather', 'softmax']
Evaluation
Non-time metrics
quantization_approach | operators_to_quantize | node_exclusion | precision (original) | precision (optimized) | recall (original) | recall (optimized) | f1 (original) | f1 (optimized) | accuracy (original) | accuracy (optimized) | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 0.936 | 0.934 | | | 0.944 | 0.942 | | | 0.940 | 0.938 | | | 0.988 | 0.988 |
dynamic |
['Add', 'MatMul'] |
[] |
| | 0.936 | 0.934 | | | 0.944 | 0.942 | | | 0.940 | 0.938 | | | 0.988 | 0.988 |
dynamic |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 0.936 | 0.936 | | | 0.944 | 0.944 | | | 0.940 | 0.940 | | | 0.988 | 0.988 |
dynamic |
['Add'] |
[] |
| | 0.936 | 0.936 | | | 0.944 | 0.944 | | | 0.940 | 0.940 | | | 0.988 | 0.988 |
static |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 0.936 | 0.904 | | | 0.944 | 0.921 | | | 0.940 | 0.912 | | | 0.988 | 0.984 |
static |
['Add', 'MatMul'] |
[] |
| | 0.936 | 0.065 | | | 0.944 | 0.243 | | | 0.940 | 0.103 | | | 0.988 | 0.357 |
static |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 0.936 | 0.909 | | | 0.944 | 0.930 | | | 0.940 | 0.919 | | | 0.988 | 0.986 |
static |
['Add'] |
[] |
| | 0.936 | 0.050 | | | 0.944 | 0.160 | | | 0.940 | 0.076 | | | 0.988 | 0.311 |
Time metrics
Time benchmarks were run for 15 seconds per config.
Below, time metrics for batch size = 1, input length = 32.
quantization_approach | operators_to_quantize | node_exclusion | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 32.90 | 7.03 | | | 30.40 | 142.20 |
dynamic |
['Add', 'MatMul'] |
[] |
| | 48.27 | 7.68 | | | 20.73 | 130.33 |
dynamic |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 33.74 | 14.73 | | | 29.67 | 67.93 |
dynamic |
['Add'] |
[] |
| | 33.49 | 14.17 | | | 29.87 | 70.60 |
static |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 47.72 | 8.20 | | | 21.00 | 121.93 |
static |
['Add', 'MatMul'] |
[] |
| | 47.87 | 10.58 | | | 20.93 | 94.60 |
static |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 45.77 | 19.00 | | | 21.87 | 52.67 |
static |
['Add'] |
[] |
| | 44.67 | 18.77 | | | 22.40 | 53.33 |
Below, time metrics for batch size = 1, input length = 64.
quantization_approach | operators_to_quantize | node_exclusion | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 59.15 | 13.60 | | | 16.93 | 73.53 |
dynamic |
['Add', 'MatMul'] |
[] |
| | 44.01 | 12.60 | | | 22.73 | 79.40 |
dynamic |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 60.50 | 29.87 | | | 16.53 | 33.53 |
dynamic |
['Add'] |
[] |
| | 45.35 | 24.10 | | | 22.07 | 41.53 |
static |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 59.98 | 16.08 | | | 16.73 | 62.20 |
static |
['Add', 'MatMul'] |
[] |
| | 43.23 | 19.02 | | | 23.20 | 52.60 |
static |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 43.15 | 32.96 | | | 23.20 | 30.40 |
static |
['Add'] |
[] |
| | 44.01 | 31.68 | | | 22.80 | 31.60 |
Below, time metrics for batch size = 1, input length = 128.
quantization_approach | operators_to_quantize | node_exclusion | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 55.20 | 25.72 | | | 18.13 | 38.93 |
dynamic |
['Add', 'MatMul'] |
[] |
| | 73.52 | 26.70 | | | 13.67 | 37.47 |
dynamic |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 71.60 | 53.26 | | | 14.00 | 18.80 |
dynamic |
['Add'] |
[] |
| | 70.39 | 56.68 | | | 14.27 | 17.67 |
static |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 71.34 | 31.75 | | | 14.07 | 31.53 |
static |
['Add', 'MatMul'] |
[] |
| | 73.55 | 37.95 | | | 13.60 | 26.40 |
static |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 70.28 | 62.70 | | | 14.27 | 16.00 |
static |
['Add'] |
[] |
| | 63.86 | 61.64 | | | 15.67 | 16.27 |
Below, time metrics for batch size = 4, input length = 32.
quantization_approach | operators_to_quantize | node_exclusion | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 70.41 | 22.67 | | | 14.27 | 44.13 |
dynamic |
['Add', 'MatMul'] |
[] |
| | 71.65 | 21.44 | | | 14.00 | 46.67 |
dynamic |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 71.72 | 55.16 | | | 14.00 | 18.13 |
dynamic |
['Add'] |
[] |
| | 55.56 | 43.87 | | | 18.00 | 22.80 |
static |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 55.45 | 27.83 | | | 18.07 | 36.00 |
static |
['Add', 'MatMul'] |
[] |
| | 66.57 | 34.45 | | | 15.07 | 29.07 |
static |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 55.23 | 59.31 | | | 18.13 | 16.87 |
static |
['Add'] |
[] |
| | 58.80 | 66.03 | | | 17.07 | 15.20 |
Below, time metrics for batch size = 4, input length = 64.
quantization_approach | operators_to_quantize | node_exclusion | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 117.71 | 43.93 | | | 8.53 | 22.80 |
dynamic |
['Add', 'MatMul'] |
[] |
| | 90.01 | 43.27 | | | 11.13 | 23.13 |
dynamic |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 94.34 | 107.02 | | | 10.60 | 9.40 |
dynamic |
['Add'] |
[] |
| | 119.11 | 82.46 | | | 8.40 | 12.13 |
static |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 120.57 | 54.70 | | | 8.33 | 18.33 |
static |
['Add', 'MatMul'] |
[] |
| | 120.00 | 57.85 | | | 8.40 | 17.33 |
static |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 119.57 | 92.50 | | | 8.40 | 10.87 |
static |
['Add'] |
[] |
| | 117.35 | 102.09 | | | 8.53 | 9.80 |
Below, time metrics for batch size = 4, input length = 128.
quantization_approach | operators_to_quantize | node_exclusion | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 220.69 | 94.33 | | | 4.53 | 10.67 |
dynamic |
['Add', 'MatMul'] |
[] |
| | 170.04 | 81.68 | | | 5.93 | 12.27 |
dynamic |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 188.59 | 171.79 | | | 5.33 | 5.87 |
dynamic |
['Add'] |
[] |
| | 219.80 | 163.62 | | | 4.60 | 6.13 |
static |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 220.25 | 94.05 | | | 4.60 | 10.67 |
static |
['Add', 'MatMul'] |
[] |
| | 222.90 | 135.06 | | | 4.53 | 7.47 |
static |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 177.41 | 211.89 | | | 5.67 | 4.73 |
static |
['Add'] |
[] |
| | 168.30 | 201.88 | | | 6.00 | 5.00 |
Below, time metrics for batch size = 8, input length = 32.
quantization_approach | operators_to_quantize | node_exclusion | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 106.46 | 42.35 | | | 9.47 | 23.67 |
dynamic |
['Add', 'MatMul'] |
[] |
| | 88.68 | 43.33 | | | 11.33 | 23.13 |
dynamic |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 91.32 | 92.08 | | | 11.00 | 10.87 |
dynamic |
['Add'] |
[] |
| | 88.33 | 94.18 | | | 11.33 | 10.67 |
static |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 107.47 | 44.74 | | | 9.33 | 22.40 |
static |
['Add', 'MatMul'] |
[] |
| | 118.39 | 64.56 | | | 8.47 | 15.53 |
static |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 87.05 | 111.36 | | | 11.53 | 9.00 |
static |
['Add'] |
[] |
| | 116.96 | 98.82 | | | 8.60 | 10.13 |
Below, time metrics for batch size = 8, input length = 64.
quantization_approach | operators_to_quantize | node_exclusion | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 165.67 | 87.71 | | | 6.07 | 11.47 |
dynamic |
['Add', 'MatMul'] |
[] |
| | 214.59 | 87.88 | | | 4.67 | 11.40 |
dynamic |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 216.06 | 163.75 | | | 4.67 | 6.13 |
dynamic |
['Add'] |
[] |
| | 176.69 | 209.28 | | | 5.67 | 4.80 |
static |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 215.12 | 86.90 | | | 4.67 | 11.53 |
static |
['Add', 'MatMul'] |
[] |
| | 215.99 | 130.39 | | | 4.67 | 7.73 |
static |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 213.87 | 224.50 | | | 4.73 | 4.47 |
static |
['Add'] |
[] |
| | 211.16 | 193.01 | | | 4.80 | 5.20 |
Below, time metrics for batch size = 8, input length = 128.
quantization_approach | operators_to_quantize | node_exclusion | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 391.16 | 183.35 | | | 2.60 | 5.47 |
dynamic |
['Add', 'MatMul'] |
[] |
| | 414.42 | 154.52 | | | 2.47 | 6.53 |
dynamic |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 314.12 | 323.94 | | | 3.20 | 3.13 |
dynamic |
['Add'] |
[] |
| | 408.15 | 325.03 | | | 2.47 | 3.13 |
static |
['Add', 'MatMul'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 337.57 | 205.59 | | | 3.00 | 4.87 |
static |
['Add', 'MatMul'] |
[] |
| | 375.10 | 225.09 | | | 2.67 | 4.47 |
static |
['Add'] |
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] |
| | 409.68 | 493.00 | | | 2.47 | 2.07 |
static |
['Add'] |
[] |
| | 397.28 | 397.74 | | | 2.53 | 2.53 |