gpt2-tensorrt / README.md
ivkalgin's picture
Update README.md
918550e
|
raw
history blame
1.41 kB
---
license: apache-2.0
datasets:
- lambada
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation-inference
- causal-lm
- int8
- tensorrt
- ENOT-AutoDL
---
# GPT2
This repository contains GPT2 onnx models compatible with TensorRT:
* gpt2-xl.onnx - GPT2-XL onnx for fp32 or fp16 engines
* gpt2-xl-i8.onnx - GPT2-XL onnx for int8+fp32 engines
Quantization of models was performed by the [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) framewor.
Code for building of TensorRT engines and examples published on [github](https://github.com/ENOT-AutoDL/ENOT-transformers).
## Metrics:
### GPT2-XL
| |TensorRT INT8+FP32|torch FP16|
|---|:---:|:---:|
| **Lambada Acc** |72.11%|71.43%|
### Test environment
* GPU RTX 4090
* CPU 11th Gen Intel(R) Core(TM) i7-11700K
* TensorRT 8.5.3.1
* pytorch 1.13.1+cu116
## Latency:
### GPT2-XL
|Input sequance length|Number of generated tokens|TensorRT INT8+FP32 ms|torch FP16 ms|Acceleration|
|:---:|:---:|:---:|:---:|:---:|
|64|64|462|1190|2.58|
|64|128|920|2360|2.54|
|64|256|1890|4710|2.54|
### Test environment
* GPU RTX 4090
* CPU 11th Gen Intel(R) Core(TM) i7-11700K
* TensorRT 8.5.3.1
* pytorch 1.13.1+cu116
## How to use
Example of inference and accuracy test [published on github](https://github.com/ENOT-AutoDL/ENOT-transformers):
```shell
git clone https://github.com/ENOT-AutoDL/ENOT-transformers
```