|
--- |
|
license: openrail++ |
|
base_model: stabilityai/stable-diffusion-xl-base-1.0 |
|
language: |
|
- en |
|
tags: |
|
- stable-diffusion |
|
- stable-diffusion-xl |
|
- tensorrt |
|
- text-to-image |
|
--- |
|
|
|
# Stable Diffusion XL 1.0 TensorRT |
|
|
|
## Introduction |
|
|
|
This repository hosts the TensorRT versions of **Stable Diffusion XL 1.0** created in collaboration with [NVIDIA](https://huggingface.co/nvidia). The optimized versions give substantial improvements in speed and efficiency. |
|
|
|
See the [usage instructions](#usage-example) for how to run the SDXL pipeline with the ONNX files hosted in this repository. The first invocation produces plan files in `engine_xl_base` and `engine_xl_refiner` specific to the accelerator being run on and are reused for later invocations. |
|
|
|
![examples](./examples.jpg) |
|
|
|
## Model Description |
|
|
|
- **Developed by:** Stability AI |
|
- **Model type:** Diffusion-based text-to-image generative model |
|
- **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md) |
|
- **Model Description:** This is a conversion of the [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [SDXL refiner 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) models for [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) optimized inference |
|
|
|
|
|
## Performance Comparison |
|
|
|
#### Timings for 30 steps at 1024x1024 |
|
|
|
| Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement | |
|
|-------------|--------------------------|-----------------------------|------------------------| |
|
| A10 | 9399 ms | 8160 ms | ~13% | |
|
| A100 | 3704 ms | 2742 ms | ~26% | |
|
| H100 | 2496 ms | 1471 ms | ~41% | |
|
|
|
#### Image throughput for 30 steps at 1024x1024 |
|
|
|
| Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement | |
|
|-------------|--------------------------|-----------------------------|------------------------| |
|
| A10 | 0.10 images/sec | 0.12 images/sec | ~20% | |
|
| A100 | 0.27 images/sec | 0.36 images/sec | ~33% | |
|
| H100 | 0.40 images/sec | 0.68 images/sec | ~70% | |
|
|
|
|
|
## Usage Example |
|
|
|
1. Following the [setup instructions](https://github.com/rajeevsrao/TensorRT/blob/release/8.6/demo/Diffusion/README.md) on launching a TensorRT NGC container. |
|
```shell |
|
git clone https://github.com/rajeevsrao/TensorRT.git |
|
cd TensorRT |
|
git checkout release/8.6 |
|
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.06-py3 /bin/bash |
|
``` |
|
|
|
2. Download the SDXL TensorRT files from this repo |
|
```shell |
|
git lfs install |
|
git clone https://huggingface.co/stabilityai/stable-diffusion-xl-1.0-tensorrt |
|
cd stable-diffusion-xl-1.0-tensorrt |
|
git lfs pull |
|
cd .. |
|
``` |
|
|
|
3. Install libraries and requirements |
|
```shell |
|
python3 -m pip install --upgrade pip |
|
python3 -m pip install --upgrade tensorrt |
|
|
|
cd demo/Diffusion |
|
pip3 install -r requirements.txt |
|
``` |
|
|
|
4. Perform TensorRT optimized inference |
|
``` |
|
python3 demo_txt2img_xl.py \ |
|
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \ |
|
--build-static-batch \ |
|
--use-cuda-graph \ |
|
--num-warmup-runs 1 \ |
|
--width 1024 \ |
|
--height 1024 \ |
|
--denoising-steps 30 \ |
|
--onnx-base-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-base \ |
|
--onnx-refiner-dir /workspace/stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-refiner |
|
``` |