nics-efc
/

MixDQ

StableDiffusionXLPipeline

Model card Files Files and versions Community

MixDQ / README.md

Stein-Fun's picture

Update README.md

74e2a7c verified 5 months ago

|

history blame contribute delete

3.23 kB

	---
	license: mit
	pipeline_tag: text-to-image
	tags:
	- diffusion
	- efficient
	- quantization
	- StableDiffusionXLPipeline
	- Diffusers
	base_model:
	- stabilityai/sdxl-turbo
	---

	# MixDQ Model Card

	## Model Description

	MixDQ is a mixed precision quantization methods that compress the memory and computational usage of text-to-image diffusion models while preserving genration quality.
	It supports few-step diffusion models (e.g., SDXL-turbo, LCM-lora) to construct both fast and tiny diffusion models. Efficient CUDA kernel implemention is provided for practical resource savings.

	<img src="https://github.com/A-suozhang/MyPicBed/raw/master/img/mixdq_model_card_0.jpg" width="600">


	## Model Sources

	for more information, please refer to:

	- Project Page: [https://a-suozhang.xyz/mixdq.github.io/](https://a-suozhang.xyz/mixdq.github.io/).
	- Arxiv paper: [https://arxiv.org/abs/2405.17873](https://arxiv.org/abs/2405.17873)
	- Github Repository: [https://github.com/A-suozhang/MixDQ](https://github.com/A-suozhang/MixDQ)

	## Evaluation

	We evaluate the MixDQ model using various metrics, including FID (fidelity), CLIPScore (image-text alignment), and ImageReward (human preference). MixDQ can achieve W8A8 quantization without performance loss. The differences between images generated by MixDQ and those generated by FP16 models are negligible.

	\| Method \| FID (↓) \| ClipScore \| ImageReward \|
	\|------------\|---------\|-----------\|-------------\|
	\| FP16 \| 17.15 \| 0.2722 \| 0.8631 \|
	\| MixDQ-W8A8 \| 17.03 \| 0.2703 \| 0.8415 \|
	\| MixDQ-W5A8 \| 17.23 \| 0.2697 \| 0.8307 \|

	## Usage


	install the prerequisite for Mixdq:
	```shell
	# The Python versions required to run mixdq: 3.8, 3.9, 3.10
	pip install -i https://pypi.org/simple/ mixdq-extension
	```

	run the pipeline:
	```python
	pipe = DiffusionPipeline.from_pretrained(
	"stabilityai/sdxl-turbo", custom_pipeline="nics-efc/MixDQ",
	torch_dtype=torch.float16, variant="fp16"
	)

	# quant the UNet
	pipe.quantize_unet(
	w_bit = 8,
	a_bit = 8,
	bos=True,
	)

	# The set_cuda_graph func is optional and used for acceleration
	pipe.set_cuda_graph(
	run_pipeline = True,
	)

	# test the memory and the lantency of the pipeline or the UNet
	pipe.run_for_test(
	device="cuda",
	output_type="pil",
	run_pipeline=True,
	path="pipeline_test.png",
	profile=True
	)
	'''
	After execution is finished, there will be a report under log/sdxl folder in formats of json.
	This report can be opened by tensorboard for users to examine profiling results:
	tensorboard --logdir=./log
	'''

	# run the pipeline
	pipe = pipe.to("cuda")
	prompts = "A black Honda motorcycle parked in front of a garage."
	image = pipe(prompts, num_inference_steps=1, guidance_scale=0.0).images[0]
	image.save('mixdq_pipeline.png')
	```



	Performance tested on NVIDIA 4080:

	\| UNet Latency (ms) \| No CUDA Graph \| With CUDA Graph \|
	\|-------------------\|---------------\|-----------------\|
	\| FP16 version \| 44.6 \| 36.1 \|
	\| Quantized version \| 59.1 \| 24.9 \|
	\| Speedup \| 0.75 \| 1.45 \|