Stable Diffusion XL 1.0 TensorRT

Introduction

This repository hosts the TensorRT versions(sdxl, sdxl-lcm, sdxl-lcmlora) of Stable Diffusion XL 1.0 created in collaboration with NVIDIA. The optimized versions give substantial improvements in speed and efficiency.

See the usage instructions for how to run the SDXL pipeline with the ONNX files hosted in this repository.

Model Description

Developed by: Stability AI
Model type: Diffusion-based text-to-image generative model
License: CreativeML Open RAIL++-M License
Model Description: This is a conversion of the SDXL base 1.0 and SDXL refiner 1.0 models for NVIDIA TensorRT optimized inference

Performance Comparison

Timings for 30 steps at 1024x1024

Accelerator	Baseline (non-optimized)	NVIDIA TensorRT (optimized)	Percentage improvement
A10	9399 ms	8160 ms	~13%
A100	3704 ms	2742 ms	~26%
H100	2496 ms	1471 ms	~41%

Image throughput for 30 steps at 1024x1024

Accelerator	Baseline (non-optimized)	NVIDIA TensorRT (optimized)	Percentage improvement
A10	0.10 images/sec	0.12 images/sec	~20%
A100	0.27 images/sec	0.36 images/sec	~33%
H100	0.40 images/sec	0.68 images/sec	~70%

Timings for Latent Consistency Model(LCM) version for 4 steps at 1024x1024

Accelerator	CLIP	Unet	VAE	Total
A100	1.08 ms	192.02 ms	228.34 ms	426.16 ms
H100	0.78 ms	102.8 ms	126.95 ms	234.22 ms

Usage Example

Following the setup instructions on launching a TensorRT NGC container.

git clone https://github.com/rajeevsrao/TensorRT.git
cd TensorRT
git checkout release/9.2
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.11-py3 /bin/bash

Download the SDXL TensorRT files from this repo

git lfs install 
git clone https://huggingface.co/stabilityai/stable-diffusion-xl-1.0-tensorrt
cd stable-diffusion-xl-1.0-tensorrt
git lfs pull
cd ..

Install libraries and requirements

cd demo/Diffusion
python3 -m pip install --upgrade pip
pip3 install -r requirements.txt
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt

Perform TensorRT optimized inference:

SDXL

The first invocation produces plan files in engine_xl_base and engine_xl_refiner specific to the accelerator being run on and are reused for later invocations.

python3 demo_txt2img_xl.py \
  "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \
  --build-static-batch \
  --use-cuda-graph \
  --num-warmup-runs 1 \
  --width 1024 \
  --height 1024 \
  --denoising-steps 30 \
  --onnx-base-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-base \
  --onnx-refiner-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-refiner

SDXL-LCM

The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations.

python3 demo_txt2img_xl.py \
  ""Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"" \
  --version=xl-1.0 \
  --onnx-dir /workspace/stable-diffusion-xl-1.0-tensorrt/lcm \
  --engine-dir /workspace/stable-diffusion-xl-1.0-tensorrt/lcm/engine-sdxl-lcm-nocfg \
  --scheduler LCM \
  --denoising-steps 4 \
  --guidance-scale 0.0 \
  --seed 42

SDXL-LCMLORA