|
--- |
|
license: mit |
|
pipeline_tag: text-to-image |
|
tags: |
|
- diffusion |
|
- efficient |
|
- quantization |
|
- StableDiffusionXLPipeline |
|
- Diffusers |
|
base_model: |
|
- stabilityai/sdxl-turbo |
|
--- |
|
|
|
# MixDQ Model Card |
|
|
|
## Model Description |
|
|
|
MixDQ is a mixed precision quantization methods that compress the memory and computational usage of text-to-image diffusion models while preserving genration quality. |
|
It supports few-step diffusion models (e.g., SDXL-turbo, LCM-lora) to construct both fast and tiny diffusion models. Efficient CUDA kernel implemention is provided for practical resource savings. |
|
|
|
<img src="https://github.com/A-suozhang/MyPicBed/raw/master/img/mixdq_model_card_0.jpg" width="600"> |
|
|
|
|
|
## Model Sources |
|
|
|
for more information, please refer to: |
|
|
|
- Project Page: [https://a-suozhang.xyz/mixdq.github.io/](https://a-suozhang.xyz/mixdq.github.io/). |
|
- Arxiv paper: [https://arxiv.org/abs/2405.17873](https://arxiv.org/abs/2405.17873) |
|
- Github Repository: [https://github.com/A-suozhang/MixDQ](https://github.com/A-suozhang/MixDQ) |
|
|
|
## Evaluation |
|
|
|
We evaluate the MixDQ model using various metrics, including FID (fidelity), CLIPScore (image-text alignment), and ImageReward (human preference). MixDQ can achieve W8A8 quantization without performance loss. The differences between images generated by MixDQ and those generated by FP16 models are negligible. |
|
|
|
| Method | FID (↓) | ClipScore | ImageReward | |
|
|------------|---------|-----------|-------------| |
|
| FP16 | 17.15 | 0.2722 | 0.8631 | |
|
| MixDQ-W8A8 | 17.03 | 0.2703 | 0.8415 | |
|
| MixDQ-W5A8 | 17.23 | 0.2697 | 0.8307 | |
|
|
|
## Usage |
|
|
|
|
|
install the prerequisite for Mixdq: |
|
```shell |
|
# The Python versions required to run mixdq: 3.8, 3.9, 3.10 |
|
pip install -i https://pypi.org/simple/ mixdq-extension |
|
``` |
|
|
|
run the pipeline: |
|
```python |
|
pipe = DiffusionPipeline.from_pretrained( |
|
"stabilityai/sdxl-turbo", custom_pipeline="nics-efc/MixDQ", |
|
torch_dtype=torch.float16, variant="fp16" |
|
) |
|
|
|
# quant the UNet |
|
pipe.quantize_unet( |
|
w_bit = 8, |
|
a_bit = 8, |
|
bos=True, |
|
) |
|
|
|
# The set_cuda_graph func is optional and used for acceleration |
|
pipe.set_cuda_graph( |
|
run_pipeline = True, |
|
) |
|
|
|
# test the memory and the lantency of the pipeline or the UNet |
|
pipe.run_for_test( |
|
device="cuda", |
|
output_type="pil", |
|
run_pipeline=True, |
|
path="pipeline_test.png", |
|
profile=True |
|
) |
|
''' |
|
After execution is finished, there will be a report under log/sdxl folder in formats of json. |
|
This report can be opened by tensorboard for users to examine profiling results: |
|
tensorboard --logdir=./log |
|
''' |
|
|
|
# run the pipeline |
|
pipe = pipe.to("cuda") |
|
prompts = "A black Honda motorcycle parked in front of a garage." |
|
image = pipe(prompts, num_inference_steps=1, guidance_scale=0.0).images[0] |
|
image.save('mixdq_pipeline.png') |
|
``` |
|
|
|
|
|
|
|
Performance tested on NVIDIA 4080: |
|
|
|
| UNet Latency (ms) | No CUDA Graph | With CUDA Graph | |
|
|-------------------|---------------|-----------------| |
|
| FP16 version | 44.6 | 36.1 | |
|
| Quantized version | 59.1 | 24.9 | |
|
| Speedup | 0.75 | 1.45 | |