Convert color images to grayscale

See the corresponding discussion at https://github.com/lllyasviel/ControlNet/discussions/561 !

I have trained a ControlNet (214244a32 drop=0.5 mp=fp16 lr=1e-5) for 1.25 epochs by using a pointwise function to convert RGB to grayscale... which effectively makes it a pointless ControlNet 🤣

I wanted to see how fast it converges on a simple linear-transformation. To emphasize again: it doesn't colorize grayscale images, it desaturates color images... which you might as well do in an image editor. It's the most ineffective way to make grayscale images. But it lets us evaluate the model very easily and we can peer into the inner workings of ControlNet a bit. And it's also a good baseline for inpainting assuming 0% masking and tells us which artefacts to expect in the unmasked area. I chose drop=0.5 because I assumed the CN should pick up on "ignore the prompt"-task very fast, similar to the desaturation task, and it lets us compare the influence of prompts, and it keeps it comparable with inpainting. I don't think it would have converged faster without any prompts.

Training

accelerate launch train_controlnet.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --train_batch_size=4 \
  --gradient_accumulation_steps=8 \
  --proportion_empty_prompts=0.5
  --mixed_precision="fp16" \
  --learning_rate=1e-5 \
  --enable_xformers_memory_efficient_attention \
  --use_8bit_adam \
  --set_grads_to_none \
  --seed=0

Image dataset

laion2B-en aesthetics>=6.5 dataset
--min_image_size 512 --max_aspect_ratio 2 --resize_mode="center_crop" --image_size 512
Cleaned with fastdup default settings
Data augmented with right-left flipped images
Resulting in 214244 images
Converted to grayscale with cv2