Edit model card

MixDQ Model Card

Model Description

MixDQ is a mixed precision quantization methods that compress the memory and computational usage of text-to-image diffusion models while preserving genration quality. It supports few-step diffusion models (e.g., SDXL-turbo, LCM-lora) to construct both fast and tiny diffusion models. Efficient CUDA kernel implemention is provided for practical resource savings.

Model Sources

for more information, please refer to:

Evaluation

We evaluate the MixDQ model using various metrics, including FID (fidelity), CLIPScore (image-text alignment), and ImageReward (human preference). MixDQ can achieve W8A8 quantization without performance loss. The differences between images generated by MixDQ and those generated by FP16 models are negligible.

Method FID (↓) ClipScore ImageReward
FP16 17.15 0.2722 0.8631
MixDQ-W8A8 17.03 0.2703 0.8415
MixDQ-W5A8 17.23 0.2697 0.8307

Usage

install the prerequisite for Mixdq:

  # The Python versions required to run mixdq: 3.8, 3.9, 3.10
  pip install -i https://pypi.org/simple/ mixdq-extension

run the pipeline:

  pipe = DiffusionPipeline.from_pretrained(
      "stabilityai/sdxl-turbo", custom_pipeline="nics-efc/MixDQ",
      torch_dtype=torch.float16, variant="fp16"
  )

  # quant the UNet
  pipe.quantize_unet(
                  w_bit = 8, 
                  a_bit = 8, 
                  bos=True, 
                  )

  # The set_cuda_graph func is optional and used for acceleration
  pipe.set_cuda_graph(
      run_pipeline = True,
  )

  # test the memory and the lantency of the pipeline or the UNet
  pipe.run_for_test(
      device="cuda",
      output_type="pil",
      run_pipeline=True,
      path="pipeline_test.png",
      profile=True
  )
  '''
  After execution is finished, there will be a report under log/sdxl folder in formats of json.
  This report can be opened by tensorboard for users to examine profiling results:
  tensorboard --logdir=./log
  '''

  # run the pipeline
  pipe = pipe.to("cuda")
  prompts = "A black Honda motorcycle parked in front of a garage."
  image = pipe(prompts, num_inference_steps=1, guidance_scale=0.0).images[0]  
  image.save('mixdq_pipeline.png')

Performance tested on NVIDIA 4080:

UNet Latency (ms) No CUDA Graph With CUDA Graph
FP16 version 44.6 36.1
Quantized version 59.1 24.9
Speedup 0.75 1.45
Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model's library. Check the docs .

Finetuned from