File size: 3,666 Bytes
ee2441e 64dedd7 0e13e08 64dedd7 0e13e08 64dedd7 ee2441e 64dedd7 ed45558 64dedd7 0b077d8 ef49e15 64dedd7 0e13e08 64dedd7 ed45558 942ef5e 6bae04c ed45558 0e13e08 0b077d8 0e13e08 ed45558 0e13e08 6bae04c ed45558 64dedd7 137b5ac 942ef5e ed45558 942ef5e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
license: openrail++
base_model: stabilityai/stable-diffusion-xl-base-1.0
language:
- en
tags:
- stable-diffusion
- stable-diffusion-xl
- tensorrt
- text-to-image
---
# Stable Diffusion XL 1.0 TensorRT
## Introduction
This repository hosts the TensorRT versions of **Stable Diffusion XL 1.0** created in collaboration with [NVIDIA](https://huggingface.co/nvidia). The optimized versions give substantial improvements in speed and efficiency.
See the [usage instructions](#usage-example) for how to run the SDXL pipeline with the ONNX files hosted in this repository. The first invocation produces plan files in `engine_xl_base` and `engine_xl_refiner` specific to the accelerator being run on and are reused for later invocations.
![examples](./examples.jpg)
## Model Description
- **Developed by:** Stability AI
- **Model type:** Diffusion-based text-to-image generative model
- **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md)
- **Model Description:** This is a conversion of the [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [SDXL refiner 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) models for [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) optimized inference
## Performance Comparison
#### Timings for 30 steps at 1024x1024
| Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement |
|-------------|--------------------------|-----------------------------|------------------------|
| A10 | 9399 ms | 8160 ms | ~13% |
| A100 | 3704 ms | 2742 ms | ~26% |
| H100 | 2496 ms | 1471 ms | ~41% |
#### Image throughput for 30 steps at 1024x1024
| Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement |
|-------------|--------------------------|-----------------------------|------------------------|
| A10 | 0.10 images/sec | 0.12 images/sec | ~20% |
| A100 | 0.27 images/sec | 0.36 images/sec | ~33% |
| H100 | 0.40 images/sec | 0.68 images/sec | ~70% |
## Usage Example
1. Following the [setup instructions](https://github.com/rajeevsrao/TensorRT/blob/release/8.6/demo/Diffusion/README.md) on launching a TensorRT NGC container.
```shell
git clone https://github.com/rajeevsrao/TensorRT.git
cd TensorRT
git checkout release/8.6
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.06-py3 /bin/bash
```
2. Download the SDXL TensorRT files from this repo
```shell
git lfs install
git clone https://huggingface.co/stabilityai/stable-diffusion-xl-1.0-tensorrt
cd stable-diffusion-xl-1.0-tensorrt
git lfs pull
cd ..
```
3. Install libraries and requirements
```shell
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade tensorrt
cd demo/Diffusion
pip3 install -r requirements.txt
```
4. Perform TensorRT optimized inference
```
python3 demo_txt2img_xl.py \
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \
--build-static-batch \
--use-cuda-graph \
--num-warmup-runs 1 \
--width 1024 \
--height 1024 \
--denoising-steps 30 \
--onnx-base-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-base \
--onnx-refiner-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-refiner
``` |