Different images while using same latents
Hi!
When asking to generate two images while using the same (duplicated) latents, I obtain two diffrents images.
latents = torch.randn((1, 4, 128, 128), device='cuda').half().repeat(2, 1, 1, 1)
When digging in the code it looks like I have different noise predicted by the UNet while it has twice the same input.
Is it normal?
Where does this variation come from?
Thank you !!
It is not two images but text and none-text latents for classifier-free guidance.
Thank you for you answer !! I am not sure to understand though. Let me be more specific.
I am generating two images by setting
num_images_per_prompt = 2
in the StableDiffusionXLAdapterPipeline call.
I have a single prompt. I also provide the latents argument for the pipe, which is the same for each image.
Therefore the input of the UNet is basically the same, yet the predicted noise differs.
INPUT (latent_model_input):
torch.Size([4, 4, 128, 128])
tensor([[-0.2959, -0.2959, -0.2959, -0.2959, -0.2959],
[-0.2959, -0.2959, -0.2959, -0.2959, -0.2959],
[-0.2959, -0.2959, -0.2959, -0.2959, -0.2959],
[-0.2959, -0.2959, -0.2959, -0.2959, -0.2959]], device='cuda:0',
dtype=torch.float16)
OUTPUT (noise_pred):
torch.Size([4, 4, 128, 128])
tensor([[-0.2391, -0.1351, -0.1200, -0.1201, -0.1230],
[-0.2391, -0.1351, -0.1200, -0.1201, -0.1230],
[-0.2365, -0.1348, -0.1201, -0.1203, -0.1234],
[-0.2365, -0.1348, -0.1201, -0.1203, -0.1234]], device='cuda:0',
dtype=torch.float16)
Is there some source of randomness in the UNet pipeline?
Best,
Théo
Yeah, the first two images are non-text of the last two images, will be ignored.