Convert color images to grayscale
See the corresponding discussion at https://github.com/lllyasviel/ControlNet/discussions/561 !
I have trained a ControlNet (214244a32 drop=0.5 mp=fp16 lr=1e-5) for 1.25 epochs by using a pointwise function to convert RGB to grayscale... which effectively makes it a pointless ControlNet 🤣
I wanted to see how fast it converges on a simple linear-transformation. To emphasize again: it doesn't colorize grayscale images, it desaturates color images... which you might as well do in an image editor. It's the most ineffective way to make grayscale images. But it lets us evaluate the model very easily and we can peer into the inner workings of ControlNet a bit. And it's also a good baseline for inpainting assuming 0% masking and tells us which artefacts to expect in the unmasked area. I chose drop=0.5 because I assumed the CN should pick up on "ignore the prompt"-task very fast, similar to the desaturation task, and it lets us compare the influence of prompts, and it keeps it comparable with inpainting. I don't think it would have converged faster without any prompts.
Training
accelerate launch train_controlnet.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--train_batch_size=4 \
--gradient_accumulation_steps=8 \
--proportion_empty_prompts=0.5
--mixed_precision="fp16" \
--learning_rate=1e-5 \
--enable_xformers_memory_efficient_attention \
--use_8bit_adam \
--set_grads_to_none \
--seed=0
Image dataset
- laion2B-en aesthetics>=6.5 dataset
- --min_image_size 512 --max_aspect_ratio 2 --resize_mode="center_crop" --image_size 512
- Cleaned with
fastdup
default settings - Data augmented with right-left flipped images
- Resulting in 214244 images
- Converted to grayscale with
cv2