Aligned Diffusion Model via DPO

Diffusion Model Aligned with thef following reward model and DPO algorithm

close-sourced vlm: claude3-opus  gemini-1.5  gpt-4o  gpt-4v
open-sourced vlm: internvl-1.5
score model: hps-2.1

How to Use

You can load the model and perform inference as follows:

from diffusers import StableDiffusionPipeline, UNet2DConditionModel

pretrained_model_name = "runwayml/stable-diffusion-v1-5"

dpo_unet = UNet2DConditionModel.from_pretrained(
        "path/to/checkpoint",
        subfolder='unet',
        torch_dtype=torch.float16
    ).to('cuda')

pipeline = StableDiffusionPipeline.from_pretrained(pretrained_model_name, torch_dtype=torch.float16)
pipeline = pipeline.to('cuda')
pipeline.safety_checker = None
pipeline.unet = dpo_unet

generator = torch.Generator(device='cuda')
generator = generator.manual_seed(1)

prompt = "a pink flower"

image = pipeline(prompt=prompt, generator=generator, guidance_scale=gs).images[0]