--- license: mit pipeline_tag: text-to-image tags: - diffusion - efficient - quantization - StableDiffusionXLPipeline - Diffusers base_model: - stabilityai/sdxl-turbo --- # MixDQ Model Card ## Model Description MixDQ is a mixed precision quantization methods that compress the memory and computational usage of text-to-image diffusion models while preserving genration quality. It supports few-step diffusion models (e.g., SDXL-turbo, LCM-lora) to construct both fast and tiny diffusion models. Efficient CUDA kernel implemention is provided for practical resource savings. ## Model Sources for more information, please refer to: - Project Page: [https://a-suozhang.xyz/mixdq.github.io/](https://a-suozhang.xyz/mixdq.github.io/). - Arxiv paper: [https://arxiv.org/abs/2405.17873](https://arxiv.org/abs/2405.17873) - Github Repository: [https://github.com/A-suozhang/MixDQ](https://github.com/A-suozhang/MixDQ) ## Evaluation We evaluate the MixDQ model using various metrics, including FID (fidelity), CLIPScore (image-text alignment), and ImageReward (human preference). MixDQ can achieve W8A8 quantization without performance loss. The differences between images generated by MixDQ and those generated by FP16 models are negligible. | Method | FID (↓) | ClipScore | ImageReward | |------------|---------|-----------|-------------| | FP16 | 17.15 | 0.2722 | 0.8631 | | MixDQ-W8A8 | 17.03 | 0.2703 | 0.8415 | | MixDQ-W5A8 | 17.23 | 0.2697 | 0.8307 | ## Usage install the prerequisite for Mixdq: ```shell # The Python versions required to run mixdq: 3.8, 3.9, 3.10 pip install -i https://pypi.org/simple/ mixdq-extension ``` run the pipeline: ```python pipe = DiffusionPipeline.from_pretrained( "stabilityai/sdxl-turbo", custom_pipeline="nics-efc/MixDQ", torch_dtype=torch.float16, variant="fp16" ) # quant the UNet pipe.quantize_unet( w_bit = 8, a_bit = 8, bos=True, ) # The set_cuda_graph func is optional and used for acceleration pipe.set_cuda_graph( run_pipeline = True, ) # test the memory and the lantency of the pipeline or the UNet pipe.run_for_test( device="cuda", output_type="pil", run_pipeline=True, path="pipeline_test.png", profile=True ) ''' After execution is finished, there will be a report under log/sdxl folder in formats of json. This report can be opened by tensorboard for users to examine profiling results: tensorboard --logdir=./log ''' # run the pipeline pipe = pipe.to("cuda") prompts = "A black Honda motorcycle parked in front of a garage." image = pipe(prompts, num_inference_steps=1, guidance_scale=0.0).images[0] image.save('mixdq_pipeline.png') ``` Performance tested on NVIDIA 4080: | UNet Latency (ms) | No CUDA Graph | With CUDA Graph | |-------------------|---------------|-----------------| | FP16 version | 44.6 | 36.1 | | Quantized version | 59.1 | 24.9 | | Speedup | 0.75 | 1.45 |