ENOT-AutoDL
/

gpt2-tensorrt

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gpt2-tensorrt / README.md

ivkalgin's picture

Update README.md

918550e over 1 year ago

|

1.41 kB

	---
	license: apache-2.0
	datasets:
	- lambada
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- text-generation-inference
	- causal-lm
	- int8
	- tensorrt
	- ENOT-AutoDL
	---

	# GPT2

	This repository contains GPT2 onnx models compatible with TensorRT:
	* gpt2-xl.onnx - GPT2-XL onnx for fp32 or fp16 engines
	* gpt2-xl-i8.onnx - GPT2-XL onnx for int8+fp32 engines

	Quantization of models was performed by the [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) framewor.
	Code for building of TensorRT engines and examples published on [github](https://github.com/ENOT-AutoDL/ENOT-transformers).

	## Metrics:

	### GPT2-XL

	\| \|TensorRT INT8+FP32\|torch FP16\|
	\|---\|:---:\|:---:\|
	\| Lambada Acc \|72.11%\|71.43%\|

	### Test environment

	* GPU RTX 4090
	* CPU 11th Gen Intel(R) Core(TM) i7-11700K
	* TensorRT 8.5.3.1
	* pytorch 1.13.1+cu116

	## Latency:

	### GPT2-XL

	\|Input sequance length\|Number of generated tokens\|TensorRT INT8+FP32 ms\|torch FP16 ms\|Acceleration\|
	\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\|64\|64\|462\|1190\|2.58\|
	\|64\|128\|920\|2360\|2.54\|
	\|64\|256\|1890\|4710\|2.54\|

	### Test environment

	* GPU RTX 4090
	* CPU 11th Gen Intel(R) Core(TM) i7-11700K
	* TensorRT 8.5.3.1
	* pytorch 1.13.1+cu116

	## How to use

	Example of inference and accuracy test [published on github](https://github.com/ENOT-AutoDL/ENOT-transformers):
	```shell
	git clone https://github.com/ENOT-AutoDL/ENOT-transformers
	```