MixDQ / README.md
Stein-Fun's picture
Update README.md
74e2a7c verified
---
license: mit
pipeline_tag: text-to-image
tags:
- diffusion
- efficient
- quantization
- StableDiffusionXLPipeline
- Diffusers
base_model:
- stabilityai/sdxl-turbo
---
# MixDQ Model Card
## Model Description
MixDQ is a mixed precision quantization methods that compress the memory and computational usage of text-to-image diffusion models while preserving genration quality.
It supports few-step diffusion models (e.g., SDXL-turbo, LCM-lora) to construct both fast and tiny diffusion models. Efficient CUDA kernel implemention is provided for practical resource savings.
<img src="https://github.com/A-suozhang/MyPicBed/raw/master/img/mixdq_model_card_0.jpg" width="600">
## Model Sources
for more information, please refer to:
- Project Page: [https://a-suozhang.xyz/mixdq.github.io/](https://a-suozhang.xyz/mixdq.github.io/).
- Arxiv paper: [https://arxiv.org/abs/2405.17873](https://arxiv.org/abs/2405.17873)
- Github Repository: [https://github.com/A-suozhang/MixDQ](https://github.com/A-suozhang/MixDQ)
## Evaluation
We evaluate the MixDQ model using various metrics, including FID (fidelity), CLIPScore (image-text alignment), and ImageReward (human preference). MixDQ can achieve W8A8 quantization without performance loss. The differences between images generated by MixDQ and those generated by FP16 models are negligible.
| Method | FID (↓) | ClipScore | ImageReward |
|------------|---------|-----------|-------------|
| FP16 | 17.15 | 0.2722 | 0.8631 |
| MixDQ-W8A8 | 17.03 | 0.2703 | 0.8415 |
| MixDQ-W5A8 | 17.23 | 0.2697 | 0.8307 |
## Usage
install the prerequisite for Mixdq:
```shell
# The Python versions required to run mixdq: 3.8, 3.9, 3.10
pip install -i https://pypi.org/simple/ mixdq-extension
```
run the pipeline:
```python
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/sdxl-turbo", custom_pipeline="nics-efc/MixDQ",
torch_dtype=torch.float16, variant="fp16"
)
# quant the UNet
pipe.quantize_unet(
w_bit = 8,
a_bit = 8,
bos=True,
)
# The set_cuda_graph func is optional and used for acceleration
pipe.set_cuda_graph(
run_pipeline = True,
)
# test the memory and the lantency of the pipeline or the UNet
pipe.run_for_test(
device="cuda",
output_type="pil",
run_pipeline=True,
path="pipeline_test.png",
profile=True
)
'''
After execution is finished, there will be a report under log/sdxl folder in formats of json.
This report can be opened by tensorboard for users to examine profiling results:
tensorboard --logdir=./log
'''
# run the pipeline
pipe = pipe.to("cuda")
prompts = "A black Honda motorcycle parked in front of a garage."
image = pipe(prompts, num_inference_steps=1, guidance_scale=0.0).images[0]
image.save('mixdq_pipeline.png')
```
Performance tested on NVIDIA 4080:
| UNet Latency (ms) | No CUDA Graph | With CUDA Graph |
|-------------------|---------------|-----------------|
| FP16 version | 44.6 | 36.1 |
| Quantized version | 59.1 | 24.9 |
| Speedup | 0.75 | 1.45 |