Update README.md (#1)
Browse files- Update README.md (d7ca456f97abd31fe1f20fc5011293da5879eb51)
- Upload hug_lab_grid.png (3bd29be1e826556184d10de9a87c1cd80c8a030e)
- Update README.md (d173db61274c9a951b429e3e79398d49258123d7)
- Update README.md (cd9b02220b7fabec41132e80f0a7042a66ce2446)
- Update README.md (44dbf89a0d0ca604a04a31b960cd46c7c2cee038)
- .gitattributes +1 -0
- README.md +22 -19
- hug_lab_grid.png +3 -0
.gitattributes
CHANGED
@@ -37,3 +37,4 @@ cann-small-couple.png filter=lfs diff=lfs merge=lfs -text
|
|
37 |
cann-small-hf-ofice.png filter=lfs diff=lfs merge=lfs -text
|
38 |
cann-small-megatron.png filter=lfs diff=lfs merge=lfs -text
|
39 |
cann-small-woman.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
37 |
cann-small-hf-ofice.png filter=lfs diff=lfs merge=lfs -text
|
38 |
cann-small-megatron.png filter=lfs diff=lfs merge=lfs -text
|
39 |
cann-small-woman.png filter=lfs diff=lfs merge=lfs -text
|
40 |
+
hug_lab_grid.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -10,9 +10,10 @@ tags:
|
|
10 |
inference: false
|
11 |
---
|
12 |
|
13 |
-
# SDXL-controlnet: Canny
|
14 |
|
15 |
-
These are controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning.
|
|
|
16 |
|
17 |
prompt: aerial view, a futuristic research complex in a bright foggy jungle, hard lighting
|
18 |
![images_0)](./cann-small-hf-ofice.png)
|
@@ -46,19 +47,19 @@ import numpy as np
|
|
46 |
import cv2
|
47 |
|
48 |
prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
|
49 |
-
negative_prompt =
|
50 |
|
51 |
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
|
52 |
|
53 |
controlnet_conditioning_scale = 0.5 # recommended for good generalization
|
54 |
|
55 |
controlnet = ControlNetModel.from_pretrained(
|
56 |
-
"diffusers/controlnet-canny-sdxl-1.0",
|
57 |
torch_dtype=torch.float16
|
58 |
)
|
59 |
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
|
60 |
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
|
61 |
-
"
|
62 |
controlnet=controlnet,
|
63 |
vae=vae,
|
64 |
torch_dtype=torch.float16,
|
@@ -73,33 +74,35 @@ image = Image.fromarray(image)
|
|
73 |
|
74 |
images = pipe(
|
75 |
prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
|
76 |
-
|
77 |
|
78 |
images[0].save(f"hug_lab.png")
|
79 |
```
|
80 |
|
81 |
-
![
|
82 |
|
83 |
To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
|
84 |
|
|
|
|
|
85 |
### Training
|
86 |
|
87 |
Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
|
89 |
#### Training data
|
90 |
-
|
91 |
-
It was then further trained for 20,000 steps on laion 6a resized to a max minimum dimension of 1024 and
|
92 |
-
then filtered to contain only minimum 1024 images. We found the further high resolution finetuning was
|
93 |
-
necessary for image quality.
|
94 |
|
95 |
#### Compute
|
96 |
-
|
97 |
-
|
98 |
-
#### Batch size
|
99 |
-
Data parallel with a single gpu batch size of 8 for a total batch size of 64.
|
100 |
-
|
101 |
-
#### Hyper Parameters
|
102 |
-
Constant learning rate of 1e-4 scaled by batch size for total learning rate of 64e-4
|
103 |
|
104 |
#### Mixed precision
|
105 |
-
|
|
|
10 |
inference: false
|
11 |
---
|
12 |
|
13 |
+
# Small SDXL-controlnet: Canny
|
14 |
|
15 |
+
These are small controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with canny conditioning. This checkpoint is 7x smaller than the original XL controlnet checkpoint.
|
16 |
+
You can find some example images in the following.
|
17 |
|
18 |
prompt: aerial view, a futuristic research complex in a bright foggy jungle, hard lighting
|
19 |
![images_0)](./cann-small-hf-ofice.png)
|
|
|
47 |
import cv2
|
48 |
|
49 |
prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
|
50 |
+
negative_prompt = "low quality, bad quality, sketches"
|
51 |
|
52 |
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
|
53 |
|
54 |
controlnet_conditioning_scale = 0.5 # recommended for good generalization
|
55 |
|
56 |
controlnet = ControlNetModel.from_pretrained(
|
57 |
+
"diffusers/controlnet-canny-sdxl-1.0-small",
|
58 |
torch_dtype=torch.float16
|
59 |
)
|
60 |
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
|
61 |
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
|
62 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
63 |
controlnet=controlnet,
|
64 |
vae=vae,
|
65 |
torch_dtype=torch.float16,
|
|
|
74 |
|
75 |
images = pipe(
|
76 |
prompt, negative_prompt=negative_prompt, image=image, controlnet_conditioning_scale=controlnet_conditioning_scale,
|
77 |
+
).images
|
78 |
|
79 |
images[0].save(f"hug_lab.png")
|
80 |
```
|
81 |
|
82 |
+
![hug_lab_grid)](./hug_lab_grid.png)
|
83 |
|
84 |
To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
|
85 |
|
86 |
+
🚨 Please note that this checkpoint is experimental and there's a lot of room for improvement. We encourage the community to build on top of it, improve it, and provide us with feedback. 🚨
|
87 |
+
|
88 |
### Training
|
89 |
|
90 |
Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
|
91 |
+
You can refer to [this script](https://github.com/patil-suraj/muse-experiments/blob/f71e7e79af24509ddb4e1b295a1d0ef8d8758dc9/ctrlnet/train_controlnet_webdataset.py) for full discolsure.
|
92 |
+
|
93 |
+
* This checkpoint does not perform distillation. We just use a smaller ControlNet initialized from the SDXL UNet. We
|
94 |
+
encourage the community to try and conduct distillation too. This resource might be of help in [this regard](https://huggingface.co/blog/sd_distillation).
|
95 |
+
* To learn more about how the ControlNet was initialized, refer to [this code block](https://github.com/patil-suraj/muse-experiments/blob/f71e7e79af24509ddb4e1b295a1d0ef8d8758dc9/ctrlnet/train_controlnet_webdataset.py#L1020C1-L1042C36).
|
96 |
+
* It does not have any attention blocks.
|
97 |
+
* The model works pretty good on most conditioning images. But for more complex conditionings, the bigger checkpoints might be better. We are still working on improving the quality of this checkpoint and looking for feedback from the community.
|
98 |
+
* We recommend playing around with the `controlnet_conditioning_scale` and `guidance_scale` arguments for potentially better
|
99 |
+
image generation quality.
|
100 |
|
101 |
#### Training data
|
102 |
+
The model was trained on 3M images from LAION aesthetic 6 plus subset, with batch size of 256 for 50k steps with constant learning rate of 3e-5.
|
|
|
|
|
|
|
103 |
|
104 |
#### Compute
|
105 |
+
One 8xA100 machine
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
#### Mixed precision
|
108 |
+
FP16
|
hug_lab_grid.png
ADDED
Git LFS Details
|