File size: 1,406 Bytes
6694dbd
 
918550e
 
 
 
 
 
 
 
 
 
 
 
6694dbd
918550e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: apache-2.0
datasets:
- lambada
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation-inference
- causal-lm
- int8
- tensorrt
- ENOT-AutoDL
---

# GPT2

This repository contains GPT2 onnx models compatible with TensorRT:
* gpt2-xl.onnx - GPT2-XL onnx for fp32 or fp16 engines
* gpt2-xl-i8.onnx - GPT2-XL onnx for int8+fp32 engines

Quantization of models was performed by the [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) framewor.
Code for building of TensorRT engines and examples published on [github](https://github.com/ENOT-AutoDL/ENOT-transformers).

## Metrics:

### GPT2-XL

|   |TensorRT INT8+FP32|torch FP16|
|---|:---:|:---:|
| **Lambada Acc** |72.11%|71.43%|

### Test environment

* GPU RTX 4090
* CPU 11th Gen Intel(R) Core(TM) i7-11700K
* TensorRT 8.5.3.1
* pytorch 1.13.1+cu116

## Latency:

### GPT2-XL

|Input sequance length|Number of generated tokens|TensorRT INT8+FP32 ms|torch FP16 ms|Acceleration|
|:---:|:---:|:---:|:---:|:---:|
|64|64|462|1190|2.58|
|64|128|920|2360|2.54|
|64|256|1890|4710|2.54|

### Test environment

* GPU RTX 4090
* CPU 11th Gen Intel(R) Core(TM) i7-11700K
* TensorRT 8.5.3.1
* pytorch 1.13.1+cu116

## How to use

Example of inference and accuracy test [published on github](https://github.com/ENOT-AutoDL/ENOT-transformers):
```shell
git clone https://github.com/ENOT-AutoDL/ENOT-transformers
```