Is it possible to use the refiner separately from the base model?
It is my understanding that the way to use the refiner is to first run the base model pipeline with output_type="latent" and then run the refiner. Before I run the refiner, I would like to do some modifications to the first image. More precisely, I would need the image as a proper image and not as a Tensor. After I make my computations, is there a way to still use the refiner ?
By the sounds of it to want to go base -> image > modifications -> refiner, so you want to run the refiner on an image.
If so to need to encode the image into latents then run the refiner. The results of the refiners are fairly subtle but it can be done
from diffusers import DiffusionPipeline, AutoencoderKL
from diffusers.image_processor import VaeImageProcessor
from PIL import Image
image = Image.open('cat.png').convert('RGB');
image_processor = VaeImageProcessor();
latents = image_processor.preprocess(image)
latents = latents.to(device="cuda")
vae = AutoencoderKL.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",
subfolder="vae", use_safetensors=True,
).to("cuda")
with torch.no_grad():
latents_dist = vae.encode(latents).latent_dist.sample() * vae.config.scaling_factor
refiner = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0",
torch_dtype=torch.float16, variant="fp16",
use_safetensors=True,
add_watermarker=False
).to('cuda')
prompt = "tiger"
image = refiner(prompt=prompt,
image=latents_dist).images[0]
image.save('e2c.png')
Notes.
Despite running the refiner in fp16 I've run the Vae encode in 32 bit as it doesn't work in 16 bit
there's a fixed fp16 vae around but I've not tried it and encoding doesn't use a lot of memory anyway.I actually ran this on a mac, I've just changed the device from 'mps' to 'cuda' and hacked out some torch changes to make fp16 work on MPS.
Before
After