File size: 1,406 Bytes
6694dbd 918550e 6694dbd 918550e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
license: apache-2.0
datasets:
- lambada
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation-inference
- causal-lm
- int8
- tensorrt
- ENOT-AutoDL
---
# GPT2
This repository contains GPT2 onnx models compatible with TensorRT:
* gpt2-xl.onnx - GPT2-XL onnx for fp32 or fp16 engines
* gpt2-xl-i8.onnx - GPT2-XL onnx for int8+fp32 engines
Quantization of models was performed by the [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) framewor.
Code for building of TensorRT engines and examples published on [github](https://github.com/ENOT-AutoDL/ENOT-transformers).
## Metrics:
### GPT2-XL
| |TensorRT INT8+FP32|torch FP16|
|---|:---:|:---:|
| **Lambada Acc** |72.11%|71.43%|
### Test environment
* GPU RTX 4090
* CPU 11th Gen Intel(R) Core(TM) i7-11700K
* TensorRT 8.5.3.1
* pytorch 1.13.1+cu116
## Latency:
### GPT2-XL
|Input sequance length|Number of generated tokens|TensorRT INT8+FP32 ms|torch FP16 ms|Acceleration|
|:---:|:---:|:---:|:---:|:---:|
|64|64|462|1190|2.58|
|64|128|920|2360|2.54|
|64|256|1890|4710|2.54|
### Test environment
* GPU RTX 4090
* CPU 11th Gen Intel(R) Core(TM) i7-11700K
* TensorRT 8.5.3.1
* pytorch 1.13.1+cu116
## How to use
Example of inference and accuracy test [published on github](https://github.com/ENOT-AutoDL/ENOT-transformers):
```shell
git clone https://github.com/ENOT-AutoDL/ENOT-transformers
```
|