Update README.md
Browse files
README.md
CHANGED
@@ -20,5 +20,9 @@ Measured at Wikitext with 4096 context length
|
|
20 |
| 5.8438 | 6.9492 |
|
21 |
|
22 |
## Speed
|
23 |
-
|
24 |
-
|
|
|
|
|
|
|
|
|
|
20 |
| 5.8438 | 6.9492 |
|
21 |
|
22 |
## Speed
|
23 |
+
|
24 |
+
Latency and throughput are measured using vLLM (`examples/benchmark_latency.py` and `examples/benchmark_throughput.py` respectively) at single A100-80G.
|
25 |
+
|
26 |
+
Latency at batch size 1: 13.5 tokens/s.
|
27 |
+
|
28 |
+
Throughput: 0.77 requests/s
|