petermcaughan
commited on
Commit
•
7353d4f
1
Parent(s):
646bb29
Update README.md
Browse files
README.md
CHANGED
@@ -34,14 +34,14 @@ Below is average latency of generating a token using a prompt of varying size us
|
|
34 |
|
35 |
| Prompt Length | Batch Size | PyTorch 2.1 torch.compile | ONNX Runtime CUDA |
|
36 |
|-------------|------------|----------------|-------------------|
|
37 |
-
|
|
38 |
-
| 256 | 1 |
|
39 |
-
| 1024 | 1 |
|
40 |
-
| 2048 | 1 |
|
41 |
-
|
|
42 |
-
| 256 | 4 |
|
43 |
-
| 1024 | 4 |
|
44 |
-
| 2048 | 4 | N/A |
|
45 |
|
46 |
## Usage Example
|
47 |
|
|
|
34 |
|
35 |
| Prompt Length | Batch Size | PyTorch 2.1 torch.compile | ONNX Runtime CUDA |
|
36 |
|-------------|------------|----------------|-------------------|
|
37 |
+
| 32 | 1 | 32.58ms | 12.08ms |
|
38 |
+
| 256 | 1 | 54.54ms | 23.20ms |
|
39 |
+
| 1024 | 1 | 100.6ms | 77.49ms |
|
40 |
+
| 2048 | 1 | 236.8ms | 144.99ms |
|
41 |
+
| 32 | 4 | 63.71ms | 15.32ms |
|
42 |
+
| 256 | 4 | 86.74ms | 75.94ms |
|
43 |
+
| 1024 | 4 | 380.2ms | 273.9ms |
|
44 |
+
| 2048 | 4 | N/A | 554.5ms |
|
45 |
|
46 |
## Usage Example
|
47 |
|