Update README.md
Browse files
README.md
CHANGED
@@ -38,12 +38,8 @@ The linear modules **bert.encoder.layer.2.output.dense, bert.encoder.layer.5.int
|
|
38 |
|
39 |
### Test result
|
40 |
|
41 |
-
- Batch size = 8
|
42 |
-
- [Amazon Web Services](https://aws.amazon.com/) c6i.xlarge (Intel ICE Lake: 4 vCPUs, 8g Memory) instance.
|
43 |
-
|
44 |
| |INT8|FP32|
|
45 |
|---|:---:|:---:|
|
46 |
-
| **Throughput (samples/sec)** |16.55|9.333|
|
47 |
| **Accuracy (eval-accuracy)** |0.7838|0.7915|
|
48 |
| **Model size (MB)** |133|418|
|
49 |
|
@@ -55,7 +51,3 @@ int8_model = OptimizedModel.from_pretrained(
|
|
55 |
'Intel/bert-base-uncased-finetuned-swag-int8-static',
|
56 |
)
|
57 |
```
|
58 |
-
|
59 |
-
Notes:
|
60 |
-
- The INT8 model has better performance than the FP32 model when the CPU is fully occupied. Otherwise, there will be the illusion that INT8 is inferior to FP32.
|
61 |
-
|
|
|
38 |
|
39 |
### Test result
|
40 |
|
|
|
|
|
|
|
41 |
| |INT8|FP32|
|
42 |
|---|:---:|:---:|
|
|
|
43 |
| **Accuracy (eval-accuracy)** |0.7838|0.7915|
|
44 |
| **Model size (MB)** |133|418|
|
45 |
|
|
|
51 |
'Intel/bert-base-uncased-finetuned-swag-int8-static',
|
52 |
)
|
53 |
```
|
|
|
|
|
|
|
|