Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: openrail++
|
3 |
+
base_model: stabilityai/stable-diffusion-xl-base-1.0
|
4 |
+
tags:
|
5 |
+
- stable-diffusion-xl
|
6 |
+
- stable-diffusion-xl-diffusers
|
7 |
+
- text-to-image
|
8 |
+
- diffusers
|
9 |
+
- controlnet
|
10 |
+
inference: false
|
11 |
+
---
|
12 |
+
|
13 |
+
# SDXL-controlnet: Canny
|
14 |
+
|
15 |
+
These are controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning. You can find some example images in the following.
|
16 |
+
|
17 |
+
prompt: aerial view, a futuristic research complex in a bright foggy jungle, hard lighting
|
18 |
+
![images_0)](./cann-small-hf-ofice.png)
|
19 |
+
|
20 |
+
prompt: a woman, close up, detailed, beautiful, street photography, photorealistic, detailed, Kodak ektar 100, natural, candid shot
|
21 |
+
![images_1)](./cann-small-woman.png)
|
22 |
+
|
23 |
+
prompt: megatron in an apocalyptic world ground, runied city in the background, photorealistic
|
24 |
+
![images_2)](./cann-small-megatron.png)
|
25 |
+
|
26 |
+
prompt: a couple watching sunset, 4k photo
|
27 |
+
![images_3)](./cann-small-couple.png)
|
28 |
+
|
29 |
+
|
30 |
+
## Usage
|
31 |
+
|
32 |
+
Make sure to first install the libraries:
|
33 |
+
|
34 |
+
```bash
|
35 |
+
pip install accelerate transformers safetensors opencv-python diffusers
|
36 |
+
```
|
37 |
+
|
38 |
+
And then we're ready to go:
|
39 |
+
|
40 |
+
```python
|
41 |
+
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
|
42 |
+
from diffusers.utils import load_image
|
43 |
+
from PIL import Image
|
44 |
+
import torch
|
45 |
+
import numpy as np
|
46 |
+
import cv2
|
47 |
+
|
48 |
+
prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
|
49 |
+
negative_prompt = 'low quality, bad quality, sketches'
|
50 |
+
|
51 |
+
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
|
52 |
+
|
53 |
+
controlnet_conditioning_scale = 0.5 # recommended for good generalization
|
54 |
+
|
55 |
+
controlnet = ControlNetModel.from_pretrained(
|
56 |
+
"diffusers/controlnet-canny-sdxl-1.0",
|
57 |
+
torch_dtype=torch.float16
|
58 |
+
)
|
59 |
+
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
|
60 |
+
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
|
61 |
+
"diffusers/controlnet-canny-sdxl-1.0-small",
|
62 |
+
controlnet=controlnet,
|
63 |
+
vae=vae,
|
64 |
+
torch_dtype=torch.float16,
|
65 |
+
)
|
66 |
+
pipe.enable_model_cpu_offload()
|
67 |
+
|
68 |
+
image = np.array(image)
|
69 |
+
image = cv2.Canny(image, 100, 200)
|
70 |
+
image = image[:, :, None]
|
71 |
+
image = np.concatenate([image, image, image], axis=2)
|
72 |
+
image = Image.fromarray(image)
|
73 |
+
|
74 |
+
images = pipe(
|
75 |
+
prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
|
76 |
+
).images
|
77 |
+
|
78 |
+
images[0].save(f"hug_lab.png")
|
79 |
+
```
|
80 |
+
|
81 |
+
![images_10)](./out_hug_lab_7.png)
|
82 |
+
|
83 |
+
To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
|
84 |
+
|
85 |
+
### Training
|
86 |
+
|
87 |
+
Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
|
88 |
+
|
89 |
+
#### Training data
|
90 |
+
This checkpoint was first trained for 20,000 steps on laion 6a resized to a max minimum dimension of 384.
|
91 |
+
It was then further trained for 20,000 steps on laion 6a resized to a max minimum dimension of 1024 and
|
92 |
+
then filtered to contain only minimum 1024 images. We found the further high resolution finetuning was
|
93 |
+
necessary for image quality.
|
94 |
+
|
95 |
+
#### Compute
|
96 |
+
one 8xA100 machine
|
97 |
+
|
98 |
+
#### Batch size
|
99 |
+
Data parallel with a single gpu batch size of 8 for a total batch size of 64.
|
100 |
+
|
101 |
+
#### Hyper Parameters
|
102 |
+
Constant learning rate of 1e-4 scaled by batch size for total learning rate of 64e-4
|
103 |
+
|
104 |
+
#### Mixed precision
|
105 |
+
fp16
|