adamelliotfields commited on
Commit
98afd85
1 Parent(s): 9e8b99d

ControlNet

Browse files
Files changed (12) hide show
  1. DOCS.md +28 -16
  2. README.md +5 -8
  3. app.css +5 -2
  4. app.py +133 -81
  5. lib/__init__.py +6 -0
  6. lib/annotators.py +25 -0
  7. lib/config.py +15 -2
  8. lib/inference.py +15 -1
  9. lib/loader.py +50 -6
  10. lib/pipelines.py +20 -1
  11. lib/utils.py +60 -0
  12. requirements.txt +3 -0
DOCS.md CHANGED
@@ -1,8 +1,8 @@
1
- ## Diffusion ZERO
2
 
3
  TL;DR: Enter a prompt or roll the `🎲` and press `Generate`.
4
 
5
- ### Prompting
6
 
7
  Positive and negative prompts are embedded by [Compel](https://github.com/damian0815/compel) for weighting. See [syntax features](https://github.com/damian0815/compel/blob/main/doc/syntax.md) to learn more.
8
 
@@ -19,13 +19,13 @@ This is the same syntax used in [InvokeAI](https://invoke-ai.github.io/InvokeAI/
19
  | `(blue)1.2` | `(blue:1.2)` |
20
  | `(blue)0.8` | `(blue:0.8)` |
21
 
22
- #### Arrays
23
 
24
  Arrays allow you to generate multiple different images from a single prompt. For example, `an adult [[blonde,brunette]] [[man,woman]]` will expand into **4** different prompts. This implementation was inspired by [Fooocus](https://github.com/lllyasviel/Fooocus/pull/1503).
25
 
26
  > NB: Make sure to set `Images` to the number of images you want to generate. Otherwise, only the first prompt will be used.
27
 
28
- ### Models
29
 
30
  Each model checkpoint has a different aesthetic:
31
 
@@ -38,7 +38,7 @@ Each model checkpoint has a different aesthetic:
38
  * [SG161222/Realistic_Vision_V5](https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE): realistic
39
  * [XpucT/Deliberate_v6](https://huggingface.co/XpucT/Deliberate): general purpose stylized
40
 
41
- ### LoRA
42
 
43
  Apply up to 2 LoRA (low-rank adaptation) adapters with adjustable strength:
44
 
@@ -47,7 +47,7 @@ Apply up to 2 LoRA (low-rank adaptation) adapters with adjustable strength:
47
 
48
  > NB: The trigger words are automatically appended to the positive prompt for you.
49
 
50
- ### Embeddings
51
 
52
  Select one or more [textual inversion](https://huggingface.co/docs/diffusers/en/using-diffusers/textual_inversion_inference) embeddings:
53
 
@@ -57,13 +57,13 @@ Select one or more [textual inversion](https://huggingface.co/docs/diffusers/en/
57
 
58
  > NB: The trigger token is automatically appended to the negative prompt for you.
59
 
60
- ### Styles
61
 
62
  [Styles](https://huggingface.co/spaces/adamelliotfields/diffusion/blob/main/data/styles.json) are prompt templates that wrap your positive and negative prompts. They were originally derived from the [twri/sdxl_prompt_styler](https://github.com/twri/sdxl_prompt_styler) Comfy node, but have since been entirely rewritten.
63
 
64
  Start by framing a simple subject like `portrait of a young adult woman` or `landscape of a mountain range` and experiment.
65
 
66
- #### Anime
67
 
68
  The `Anime: *` styles work the best with Dreamshaper. When using the anime-specific Anything model, you should use the `Anime: Anything` style with the following settings:
69
 
@@ -73,13 +73,15 @@ The `Anime: *` styles work the best with Dreamshaper. When using the anime-speci
73
 
74
  You subject should be a few simple tokens like `girl, brunette, blue eyes, armor, nebula, celestial`. Experiment with `Clip Skip` and `Karras`. Finish with the `Perfection Style` LoRA on a moderate setting and upscale.
75
 
76
- ### Scale
77
 
78
  Rescale up to 4x using [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) with weights from [ai-forever](ai-forever/Real-ESRGAN). Necessary for high-resolution images.
79
 
80
- ### Image-to-Image
81
 
82
- The `🖼️ Image` tab enables the image-to-image and IP-Adapter pipelines. Either use the image input or select a generation from the gallery. To disable, simply clear the image input (the `x` overlay button).
 
 
83
 
84
  Denoising strength is essentially how much the generation will differ from the input image. A value of `0` will be identical to the original, while `1` will be a completely new image. You may want to also increase the number of inference steps. Only applies to the image-to-image input.
85
 
@@ -89,9 +91,19 @@ In an image-to-image pipeline, the input image is used as the initial latent. Wi
89
 
90
  For capturing faces, enable `IP-Adapter Face` to use the full-face model. You should use an input image that is mostly a face and it should be high quality. You can generate fake portraits with Realistic Vision to experiment. Note that you'll never get true identity preservation without an advanced pipeline like [InstantID](https://github.com/instantX-research/InstantID), which combines many techniques.
91
 
92
- ### Advanced
 
 
 
 
 
 
 
 
 
 
93
 
94
- #### DeepCache
95
 
96
  [DeepCache](https://github.com/horseee/DeepCache) caches lower UNet layers and reuses them every `Interval` steps. Trade quality for speed:
97
  * `1`: no caching (default)
@@ -99,14 +111,14 @@ For capturing faces, enable `IP-Adapter Face` to use the full-face model. You sh
99
  * `3`: balanced
100
  * `4`: more speed
101
 
102
- #### FreeU
103
 
104
  [FreeU](https://github.com/ChenyangSi/FreeU) re-weights the contributions sourced from the UNet’s skip connections and backbone feature maps. Can sometimes improve image quality.
105
 
106
- #### Clip Skip
107
 
108
  When enabled, the last CLIP layer is skipped. Can sometimes improve image quality.
109
 
110
- #### Tiny VAE
111
 
112
  Enable [madebyollin/taesd](https://github.com/madebyollin/taesd) for near-instant latent decoding with a minor loss in detail. Useful for development.
 
1
+ # Diffusion ZERO
2
 
3
  TL;DR: Enter a prompt or roll the `🎲` and press `Generate`.
4
 
5
+ ## Prompting
6
 
7
  Positive and negative prompts are embedded by [Compel](https://github.com/damian0815/compel) for weighting. See [syntax features](https://github.com/damian0815/compel/blob/main/doc/syntax.md) to learn more.
8
 
 
19
  | `(blue)1.2` | `(blue:1.2)` |
20
  | `(blue)0.8` | `(blue:0.8)` |
21
 
22
+ ### Arrays
23
 
24
  Arrays allow you to generate multiple different images from a single prompt. For example, `an adult [[blonde,brunette]] [[man,woman]]` will expand into **4** different prompts. This implementation was inspired by [Fooocus](https://github.com/lllyasviel/Fooocus/pull/1503).
25
 
26
  > NB: Make sure to set `Images` to the number of images you want to generate. Otherwise, only the first prompt will be used.
27
 
28
+ ## Models
29
 
30
  Each model checkpoint has a different aesthetic:
31
 
 
38
  * [SG161222/Realistic_Vision_V5](https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE): realistic
39
  * [XpucT/Deliberate_v6](https://huggingface.co/XpucT/Deliberate): general purpose stylized
40
 
41
+ ## LoRA
42
 
43
  Apply up to 2 LoRA (low-rank adaptation) adapters with adjustable strength:
44
 
 
47
 
48
  > NB: The trigger words are automatically appended to the positive prompt for you.
49
 
50
+ ## Embeddings
51
 
52
  Select one or more [textual inversion](https://huggingface.co/docs/diffusers/en/using-diffusers/textual_inversion_inference) embeddings:
53
 
 
57
 
58
  > NB: The trigger token is automatically appended to the negative prompt for you.
59
 
60
+ ## Styles
61
 
62
  [Styles](https://huggingface.co/spaces/adamelliotfields/diffusion/blob/main/data/styles.json) are prompt templates that wrap your positive and negative prompts. They were originally derived from the [twri/sdxl_prompt_styler](https://github.com/twri/sdxl_prompt_styler) Comfy node, but have since been entirely rewritten.
63
 
64
  Start by framing a simple subject like `portrait of a young adult woman` or `landscape of a mountain range` and experiment.
65
 
66
+ ### Anime
67
 
68
  The `Anime: *` styles work the best with Dreamshaper. When using the anime-specific Anything model, you should use the `Anime: Anything` style with the following settings:
69
 
 
73
 
74
  You subject should be a few simple tokens like `girl, brunette, blue eyes, armor, nebula, celestial`. Experiment with `Clip Skip` and `Karras`. Finish with the `Perfection Style` LoRA on a moderate setting and upscale.
75
 
76
+ ## Scale
77
 
78
  Rescale up to 4x using [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) with weights from [ai-forever](ai-forever/Real-ESRGAN). Necessary for high-resolution images.
79
 
80
+ ## Image-to-Image
81
 
82
+ The `🖼️ Image` tab enables the image-to-image and IP-Adapter pipelines.
83
+
84
+ ### Strength
85
 
86
  Denoising strength is essentially how much the generation will differ from the input image. A value of `0` will be identical to the original, while `1` will be a completely new image. You may want to also increase the number of inference steps. Only applies to the image-to-image input.
87
 
 
91
 
92
  For capturing faces, enable `IP-Adapter Face` to use the full-face model. You should use an input image that is mostly a face and it should be high quality. You can generate fake portraits with Realistic Vision to experiment. Note that you'll never get true identity preservation without an advanced pipeline like [InstantID](https://github.com/instantX-research/InstantID), which combines many techniques.
93
 
94
+ ## ControlNet
95
+
96
+ The `🎮 Control` tab enables the [ControlNet](https://github.com/lllyasviel/ControlNet) pipelines. Read the [Diffusers docs](https://huggingface.co/docs/diffusers/using-diffusers/controlnet) to learn more.
97
+
98
+ ### Annotators
99
+
100
+ In ControlNet, the input image is a feature map produced by an _annotator_. These are computer vision models used for tasks like edge detection and pose estimation. ControlNet models are trained to understand these feature maps.
101
+
102
+ > NB: Control images will be automatically resized to the nearest multiple of 64 (e.g., 513 -> 512).
103
+
104
+ ## Advanced
105
 
106
+ ### DeepCache
107
 
108
  [DeepCache](https://github.com/horseee/DeepCache) caches lower UNet layers and reuses them every `Interval` steps. Trade quality for speed:
109
  * `1`: no caching (default)
 
111
  * `3`: balanced
112
  * `4`: more speed
113
 
114
+ ### FreeU
115
 
116
  [FreeU](https://github.com/ChenyangSi/FreeU) re-weights the contributions sourced from the UNet’s skip connections and backbone feature maps. Can sometimes improve image quality.
117
 
118
+ ### Clip Skip
119
 
120
  When enabled, the last CLIP layer is skipped. Can sometimes improve image quality.
121
 
122
+ ### Tiny VAE
123
 
124
  Enable [madebyollin/taesd](https://github.com/madebyollin/taesd) for near-instant latent decoding with a minor loss in detail. Useful for development.
README.md CHANGED
@@ -6,7 +6,7 @@ emoji: 🧨
6
  colorFrom: purple
7
  colorTo: blue
8
  sdk: gradio
9
- sdk_version: 4.41.0
10
  python_version: 3.11.9
11
  app_file: app.py
12
  fullWidth: false
@@ -25,9 +25,6 @@ models:
25
  - SG161222/Realistic_Vision_V5.1_noVAE
26
  - XpucT/Deliberate
27
  preload_from_hub: # up to 10
28
- - >-
29
- ai-forever/Real-ESRGAN
30
- RealESRGAN_x2.pth,RealESRGAN_x4.pth
31
  - >-
32
  Comfy-Org/stable-diffusion-v1-5-archive
33
  v1-5-pruned-emaonly-fp16.safetensors
@@ -43,6 +40,9 @@ preload_from_hub: # up to 10
43
  - >-
44
  Linaqruf/anything-v3-1
45
  anything-v3-2.safetensors
 
 
 
46
  - >-
47
  Lykon/dreamshaper-8
48
  feature_extractor/preprocessor_config.json,safety_checker/config.json,scheduler/scheduler_config.json,text_encoder/config.json,text_encoder/model.fp16.safetensors,tokenizer/merges.txt,tokenizer/special_tokens_map.json,tokenizer/tokenizer_config.json,tokenizer/vocab.json,unet/config.json,unet/diffusion_pytorch_model.fp16.safetensors,vae/config.json,vae/diffusion_pytorch_model.fp16.safetensors,model_index.json
@@ -62,6 +62,7 @@ preload_from_hub: # up to 10
62
  Gradio app for Stable Diffusion 1.5 featuring:
63
  * txt2img and img2img pipelines with IP-Adapter
64
  * Curated models, LoRAs, and TI embeddings
 
65
  * Compel prompt weighting
66
  * dozens of styles and starter prompts
67
  * Multiple samplers with Karras scheduling
@@ -69,12 +70,8 @@ Gradio app for Stable Diffusion 1.5 featuring:
69
  * Real-ESRGAN upscaling
70
  * Optional tiny autoencoder
71
 
72
- There's also a [CLI](https://huggingface.co/spaces/adamelliotfields/diffusion/blob/main/cli.py).
73
-
74
  ## Motivation
75
 
76
- I want to:
77
-
78
  * host a free and easy-to-use Stable Diffusion UI on ZeroGPU
79
  * provide the necessary tools for common workflows
80
  * curate useful models, adapters, and embeddings
 
6
  colorFrom: purple
7
  colorTo: blue
8
  sdk: gradio
9
+ sdk_version: 4.44.0
10
  python_version: 3.11.9
11
  app_file: app.py
12
  fullWidth: false
 
25
  - SG161222/Realistic_Vision_V5.1_noVAE
26
  - XpucT/Deliberate
27
  preload_from_hub: # up to 10
 
 
 
28
  - >-
29
  Comfy-Org/stable-diffusion-v1-5-archive
30
  v1-5-pruned-emaonly-fp16.safetensors
 
40
  - >-
41
  Linaqruf/anything-v3-1
42
  anything-v3-2.safetensors
43
+ - >-
44
+ lllyasviel/control_v11p_sd15_canny
45
+ diffusion_pytorch_model.fp16.safetensors
46
  - >-
47
  Lykon/dreamshaper-8
48
  feature_extractor/preprocessor_config.json,safety_checker/config.json,scheduler/scheduler_config.json,text_encoder/config.json,text_encoder/model.fp16.safetensors,tokenizer/merges.txt,tokenizer/special_tokens_map.json,tokenizer/tokenizer_config.json,tokenizer/vocab.json,unet/config.json,unet/diffusion_pytorch_model.fp16.safetensors,vae/config.json,vae/diffusion_pytorch_model.fp16.safetensors,model_index.json
 
62
  Gradio app for Stable Diffusion 1.5 featuring:
63
  * txt2img and img2img pipelines with IP-Adapter
64
  * Curated models, LoRAs, and TI embeddings
65
+ * ControlNet with annotators
66
  * Compel prompt weighting
67
  * dozens of styles and starter prompts
68
  * Multiple samplers with Karras scheduling
 
70
  * Real-ESRGAN upscaling
71
  * Optional tiny autoencoder
72
 
 
 
73
  ## Motivation
74
 
 
 
75
  * host a free and easy-to-use Stable Diffusion UI on ZeroGPU
76
  * provide the necessary tools for common workflows
77
  * curate useful models, adapters, and embeddings
app.css CHANGED
@@ -30,7 +30,7 @@
30
  overflow-y: auto;
31
  }
32
  .gallery, .gallery .grid-wrap {
33
- height: calc(100vh - 422px);
34
  max-height: none;
35
  }
36
 
@@ -108,7 +108,10 @@
108
  content: 'Random prompt';
109
  }
110
  .popover#clear:hover::after {
111
- content: 'Clear gallery';
 
 
 
112
  }
113
  .popover#refresh:hover::after {
114
  content: var(--seed, "-1");
 
30
  overflow-y: auto;
31
  }
32
  .gallery, .gallery .grid-wrap {
33
+ height: calc(100vh - 430px);
34
  max-height: none;
35
  }
36
 
 
108
  content: 'Random prompt';
109
  }
110
  .popover#clear:hover::after {
111
+ content: 'Clear';
112
+ }
113
+ .popover#clear-control:hover::after {
114
+ content: 'Clear';
115
  }
116
  .popover#refresh:hover::after {
117
  content: var(--seed, "-1");
app.py CHANGED
@@ -6,13 +6,16 @@ import random
6
  import gradio as gr
7
 
8
  from lib import (
 
9
  Config,
10
  async_call,
11
  disable_progress_bars,
12
  download_civit_file,
13
  download_repo_files,
14
  generate,
 
15
  read_file,
 
16
  )
17
 
18
  # the CSS `content` attribute expects a string so we need to wrap the number in quotes
@@ -84,6 +87,15 @@ async def random_fn():
84
  return gr.Textbox(value=random.choice(prompts))
85
 
86
 
 
 
 
 
 
 
 
 
 
87
  async def generate_fn(*args, progress=gr.Progress(track_tqdm=True)):
88
  if len(args) > 0:
89
  prompt = args[0]
@@ -92,6 +104,7 @@ async def generate_fn(*args, progress=gr.Progress(track_tqdm=True)):
92
  if prompt is None or prompt.strip() == "":
93
  raise gr.Error("You must enter a prompt")
94
 
 
95
  DISABLE_IMAGE_PROMPT, DISABLE_IP_IMAGE_PROMPT = args[-2:]
96
  gen_args = list(args[:-2])
97
  if DISABLE_IMAGE_PROMPT:
@@ -148,25 +161,24 @@ with gr.Blocks(
148
  with gr.Tabs():
149
  with gr.TabItem("🏠 Text"):
150
  with gr.Column():
151
- with gr.Group():
152
- output_images = gr.Gallery(
153
- elem_classes=["gallery"],
154
- show_share_button=False,
155
- object_fit="cover",
156
- interactive=False,
157
- show_label=False,
158
- label="Output",
159
- format="png",
160
- columns=2,
161
- )
162
- prompt = gr.Textbox(
163
- placeholder="What do you want to see?",
164
- autoscroll=False,
165
- show_label=False,
166
- label="Prompt",
167
- max_lines=3,
168
- lines=3,
169
- )
170
 
171
  # Buttons
172
  with gr.Row():
@@ -196,72 +208,104 @@ with gr.Blocks(
196
 
197
  # img2img tab
198
  with gr.TabItem("🖼️ Image"):
199
- with gr.Group():
200
- with gr.Row():
201
- image_prompt = gr.Image(
202
- show_share_button=False,
203
- label="Initial Image",
204
- min_width=320,
205
- format="png",
206
- type="pil",
207
- )
208
- ip_image_prompt = gr.Image(
209
- show_share_button=False,
210
- label="IP-Adapter Image",
211
- min_width=320,
212
- format="png",
213
- type="pil",
214
- )
215
 
216
- with gr.Row():
217
- image_select = gr.Dropdown(
218
- info="Use an initial image from the gallery",
219
- choices=[("None", -1)],
220
- label="Gallery Image",
221
- interactive=True,
222
- filterable=False,
223
- value=-1,
224
- )
225
- ip_image_select = gr.Dropdown(
226
- info="Use an IP-Adapter image from the gallery",
227
- label="Gallery Image (IP-Adapter)",
228
- choices=[("None", -1)],
229
- interactive=True,
230
- filterable=False,
231
- value=-1,
232
- )
233
 
234
- with gr.Row():
235
- denoising_strength = gr.Slider(
236
- value=Config.DENOISING_STRENGTH,
237
- label="Denoising Strength",
238
- minimum=0.0,
239
- maximum=1.0,
240
- step=0.1,
241
- )
242
 
243
- with gr.Row():
244
- disable_image = gr.Checkbox(
245
- elem_classes=["checkbox"],
246
- label="Disable Initial Image",
247
- value=False,
248
- )
249
- disable_ip_image = gr.Checkbox(
250
- elem_classes=["checkbox"],
251
- label="Disable IP-Adapter Image",
252
- value=False,
253
- )
254
- ip_face = gr.Checkbox(
255
- elem_classes=["checkbox"],
256
- label="Use IP-Adapter Face",
257
- value=False,
258
- )
259
 
260
- # img2img tab
261
  with gr.TabItem("🎮 Control"):
262
- gr.Markdown(
263
- "[ControlNet](https://github.com/lllyasviel/ControlNet) with [preprocessors](https://github.com/huggingface/controlnet_aux) coming soon!"
264
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
265
 
266
  with gr.TabItem("⚙️ Menu"):
267
  with gr.Group():
@@ -445,6 +489,12 @@ with gr.Blocks(
445
  value=False,
446
  )
447
 
 
 
 
 
 
 
448
  random_btn.click(random_fn, inputs=[], outputs=[prompt], show_api=False)
449
 
450
  refresh_btn.click(None, inputs=[], outputs=[seed], js=refresh_seed_js)
@@ -530,7 +580,7 @@ with gr.Blocks(
530
  negative_prompt,
531
  image_prompt,
532
  ip_image_prompt,
533
- ip_face,
534
  lora_1,
535
  lora_1_weight,
536
  lora_2,
@@ -540,6 +590,7 @@ with gr.Blocks(
540
  seed,
541
  model,
542
  scheduler,
 
543
  width,
544
  height,
545
  guidance_scale,
@@ -552,6 +603,7 @@ with gr.Blocks(
552
  use_taesd,
553
  use_freeu,
554
  use_clip_skip,
 
555
  DISABLE_IMAGE_PROMPT,
556
  DISABLE_IP_IMAGE_PROMPT,
557
  ],
 
6
  import gradio as gr
7
 
8
  from lib import (
9
+ CannyAnnotator,
10
  Config,
11
  async_call,
12
  disable_progress_bars,
13
  download_civit_file,
14
  download_repo_files,
15
  generate,
16
+ get_valid_size,
17
  read_file,
18
+ resize_image,
19
  )
20
 
21
  # the CSS `content` attribute expects a string so we need to wrap the number in quotes
 
87
  return gr.Textbox(value=random.choice(prompts))
88
 
89
 
90
+ # TODO: move this to another file once more annotators are added; will need @GPU decorator
91
+ async def annotate_fn(image, annotator):
92
+ size = get_valid_size(image)
93
+ image = resize_image(image, size)
94
+ if annotator == "canny":
95
+ canny = CannyAnnotator()
96
+ return canny(image, size)
97
+
98
+
99
  async def generate_fn(*args, progress=gr.Progress(track_tqdm=True)):
100
  if len(args) > 0:
101
  prompt = args[0]
 
104
  if prompt is None or prompt.strip() == "":
105
  raise gr.Error("You must enter a prompt")
106
 
107
+ # always the last arguments
108
  DISABLE_IMAGE_PROMPT, DISABLE_IP_IMAGE_PROMPT = args[-2:]
109
  gen_args = list(args[:-2])
110
  if DISABLE_IMAGE_PROMPT:
 
161
  with gr.Tabs():
162
  with gr.TabItem("🏠 Text"):
163
  with gr.Column():
164
+ output_images = gr.Gallery(
165
+ elem_classes=["gallery"],
166
+ show_share_button=False,
167
+ object_fit="cover",
168
+ interactive=False,
169
+ show_label=False,
170
+ label="Output",
171
+ format="png",
172
+ columns=2,
173
+ )
174
+ prompt = gr.Textbox(
175
+ placeholder="What do you want to see?",
176
+ autoscroll=False,
177
+ show_label=False,
178
+ label="Prompt",
179
+ max_lines=3,
180
+ lines=3,
181
+ )
 
182
 
183
  # Buttons
184
  with gr.Row():
 
208
 
209
  # img2img tab
210
  with gr.TabItem("🖼️ Image"):
211
+ with gr.Row():
212
+ image_prompt = gr.Image(
213
+ show_share_button=False,
214
+ label="Initial Image",
215
+ min_width=320,
216
+ format="png",
217
+ type="pil",
218
+ )
219
+ ip_image_prompt = gr.Image(
220
+ show_share_button=False,
221
+ label="IP-Adapter Image",
222
+ min_width=320,
223
+ format="png",
224
+ type="pil",
225
+ )
 
226
 
227
+ with gr.Row():
228
+ image_select = gr.Dropdown(
229
+ info="Use an initial image from the gallery",
230
+ choices=[("None", -1)],
231
+ label="Gallery Image",
232
+ interactive=True,
233
+ filterable=False,
234
+ value=-1,
235
+ )
236
+ ip_image_select = gr.Dropdown(
237
+ info="Use an IP-Adapter image from the gallery",
238
+ label="Gallery Image",
239
+ choices=[("None", -1)],
240
+ interactive=True,
241
+ filterable=False,
242
+ value=-1,
243
+ )
244
 
245
+ with gr.Row():
246
+ denoising_strength = gr.Slider(
247
+ value=Config.DENOISING_STRENGTH,
248
+ label="Denoising Strength",
249
+ minimum=0.0,
250
+ maximum=1.0,
251
+ step=0.1,
252
+ )
253
 
254
+ with gr.Row():
255
+ disable_image = gr.Checkbox(
256
+ elem_classes=["checkbox"],
257
+ label="Disable Initial Image",
258
+ value=False,
259
+ )
260
+ disable_ip_image = gr.Checkbox(
261
+ elem_classes=["checkbox"],
262
+ label="Disable IP-Adapter Image",
263
+ value=False,
264
+ )
265
+ use_ip_face = gr.Checkbox(
266
+ elem_classes=["checkbox"],
267
+ label="Use IP-Adapter Face",
268
+ value=False,
269
+ )
270
 
271
+ # controlnet tab
272
  with gr.TabItem("🎮 Control"):
273
+ with gr.Row():
274
+ control_image_input = gr.Image(
275
+ show_share_button=False,
276
+ label="Control Image",
277
+ min_width=320,
278
+ format="png",
279
+ type="pil",
280
+ )
281
+ control_image_prompt = gr.Image(
282
+ interactive=False,
283
+ show_share_button=False,
284
+ label="Control Image Output",
285
+ show_label=False,
286
+ min_width=320,
287
+ format="png",
288
+ type="pil",
289
+ )
290
+
291
+ with gr.Row():
292
+ control_annotator = gr.Dropdown(
293
+ choices=[("Canny", "canny")],
294
+ label="Annotator",
295
+ filterable=False,
296
+ value="canny",
297
+ )
298
+
299
+ with gr.Row():
300
+ annotate_btn = gr.Button("Annotate", variant="primary")
301
+ clear_control_btn = gr.ClearButton(
302
+ elem_classes=["icon-button", "popover"],
303
+ components=[control_image_prompt],
304
+ variant="secondary",
305
+ elem_id="clear-control",
306
+ min_width=0,
307
+ value="🗑️",
308
+ )
309
 
310
  with gr.TabItem("⚙️ Menu"):
311
  with gr.Group():
 
489
  value=False,
490
  )
491
 
492
+ annotate_btn.click(
493
+ annotate_fn,
494
+ inputs=[control_image_input, control_annotator],
495
+ outputs=[control_image_prompt],
496
+ )
497
+
498
  random_btn.click(random_fn, inputs=[], outputs=[prompt], show_api=False)
499
 
500
  refresh_btn.click(None, inputs=[], outputs=[seed], js=refresh_seed_js)
 
580
  negative_prompt,
581
  image_prompt,
582
  ip_image_prompt,
583
+ control_image_prompt,
584
  lora_1,
585
  lora_1_weight,
586
  lora_2,
 
590
  seed,
591
  model,
592
  scheduler,
593
+ control_annotator,
594
  width,
595
  height,
596
  guidance_scale,
 
603
  use_taesd,
604
  use_freeu,
605
  use_clip_skip,
606
+ use_ip_face,
607
  DISABLE_IMAGE_PROMPT,
608
  DISABLE_IP_IMAGE_PROMPT,
609
  ],
lib/__init__.py CHANGED
@@ -1,3 +1,4 @@
 
1
  from .config import Config
2
  from .inference import generate
3
  from .loader import Loader
@@ -9,13 +10,16 @@ from .utils import (
9
  download_civit_file,
10
  download_repo_files,
11
  enable_progress_bars,
 
12
  load_json,
13
  read_file,
 
14
  safe_progress,
15
  timer,
16
  )
17
 
18
  __all__ = [
 
19
  "Config",
20
  "Loader",
21
  "Logger",
@@ -26,8 +30,10 @@ __all__ = [
26
  "download_repo_files",
27
  "enable_progress_bars",
28
  "generate",
 
29
  "load_json",
30
  "read_file",
 
31
  "safe_progress",
32
  "timer",
33
  ]
 
1
+ from .annotators import CannyAnnotator
2
  from .config import Config
3
  from .inference import generate
4
  from .loader import Loader
 
10
  download_civit_file,
11
  download_repo_files,
12
  enable_progress_bars,
13
+ get_valid_size,
14
  load_json,
15
  read_file,
16
+ resize_image,
17
  safe_progress,
18
  timer,
19
  )
20
 
21
  __all__ = [
22
+ "CannyAnnotator",
23
  "Config",
24
  "Loader",
25
  "Logger",
 
30
  "download_repo_files",
31
  "enable_progress_bars",
32
  "generate",
33
+ "get_valid_size",
34
  "load_json",
35
  "read_file",
36
+ "resize_image",
37
  "safe_progress",
38
  "timer",
39
  ]
lib/annotators.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from threading import Lock
2
+
3
+ from controlnet_aux import CannyDetector
4
+
5
+
6
+ class CannyAnnotator:
7
+ _instance = None
8
+ _lock = Lock()
9
+
10
+ def __new__(cls):
11
+ with cls._lock:
12
+ if cls._instance is None:
13
+ cls._instance = super().__new__(cls)
14
+ cls._instance.model = CannyDetector()
15
+ return cls._instance
16
+
17
+ def __call__(self, img, size):
18
+ resolution = min(*size)
19
+ return self.model(
20
+ img,
21
+ low_threshold=100,
22
+ high_threshold=200,
23
+ detect_resolution=resolution,
24
+ image_resolution=resolution,
25
+ )
lib/config.py CHANGED
@@ -16,7 +16,12 @@ from diffusers import (
16
  from diffusers.utils import logging as diffusers_logging
17
  from transformers import logging as transformers_logging
18
 
19
- from .pipelines import CustomStableDiffusionImg2ImgPipeline, CustomStableDiffusionPipeline
 
 
 
 
 
20
 
21
  # improved GPU handling and progress bars; set before importing spaces
22
  os.environ["ZEROGPU_V2"] = "1"
@@ -53,11 +58,14 @@ Config = SimpleNamespace(
53
  ZERO_GPU=import_module("spaces").config.Config.zero_gpu,
54
  HF_MODELS={
55
  # downloaded on startup
56
- "Lykon/dreamshaper-8": [*_sd_files],
57
  "Comfy-Org/stable-diffusion-v1-5-archive": ["v1-5-pruned-emaonly-fp16.safetensors"],
58
  "cyberdelia/CyberRealistic": ["CyberRealistic_V5_FP16.safetensors"],
59
  "fluently/Fluently-v4": ["Fluently-v4.safetensors"],
60
  "Linaqruf/anything-v3-1": ["anything-v3-2.safetensors"],
 
 
 
61
  "prompthero/openjourney-v4": ["openjourney-v4.ckpt"],
62
  "SG161222/Realistic_Vision_V5.1_noVAE": ["Realistic_Vision_V5.1_fp16-no-ema.safetensors"],
63
  "XpucT/Deliberate": ["Deliberate_v6.safetensors"],
@@ -89,6 +97,8 @@ Config = SimpleNamespace(
89
  PIPELINES={
90
  "txt2img": CustomStableDiffusionPipeline,
91
  "img2img": CustomStableDiffusionImg2ImgPipeline,
 
 
92
  },
93
  MODEL="Lykon/dreamshaper-8",
94
  MODELS=[
@@ -121,6 +131,9 @@ Config = SimpleNamespace(
121
  "PNDM": PNDMScheduler,
122
  "UniPC 2M": UniPCMultistepScheduler,
123
  },
 
 
 
124
  EMBEDDING="fast_negative",
125
  EMBEDDINGS=[
126
  "cyberrealistic_negative",
 
16
  from diffusers.utils import logging as diffusers_logging
17
  from transformers import logging as transformers_logging
18
 
19
+ from .pipelines import (
20
+ CustomStableDiffusionControlNetImg2ImgPipeline,
21
+ CustomStableDiffusionControlNetPipeline,
22
+ CustomStableDiffusionImg2ImgPipeline,
23
+ CustomStableDiffusionPipeline,
24
+ )
25
 
26
  # improved GPU handling and progress bars; set before importing spaces
27
  os.environ["ZEROGPU_V2"] = "1"
 
58
  ZERO_GPU=import_module("spaces").config.Config.zero_gpu,
59
  HF_MODELS={
60
  # downloaded on startup
61
+ "ai-forever/Real-ESRGAN": ["RealESRGAN_x2.pth", "RealESRGAN_x4.pth"],
62
  "Comfy-Org/stable-diffusion-v1-5-archive": ["v1-5-pruned-emaonly-fp16.safetensors"],
63
  "cyberdelia/CyberRealistic": ["CyberRealistic_V5_FP16.safetensors"],
64
  "fluently/Fluently-v4": ["Fluently-v4.safetensors"],
65
  "Linaqruf/anything-v3-1": ["anything-v3-2.safetensors"],
66
+ "lllyasviel/control_v11p_sd15_canny": ["diffusion_pytorch_model.fp16.safetensors"],
67
+ "Lykon/dreamshaper-8": [*_sd_files],
68
+ "madebyollin/taesd": ["diffusion_pytorch_model.safetensors"],
69
  "prompthero/openjourney-v4": ["openjourney-v4.ckpt"],
70
  "SG161222/Realistic_Vision_V5.1_noVAE": ["Realistic_Vision_V5.1_fp16-no-ema.safetensors"],
71
  "XpucT/Deliberate": ["Deliberate_v6.safetensors"],
 
97
  PIPELINES={
98
  "txt2img": CustomStableDiffusionPipeline,
99
  "img2img": CustomStableDiffusionImg2ImgPipeline,
100
+ "controlnet_txt2img": CustomStableDiffusionControlNetPipeline,
101
+ "controlnet_img2img": CustomStableDiffusionControlNetImg2ImgPipeline,
102
  },
103
  MODEL="Lykon/dreamshaper-8",
104
  MODELS=[
 
131
  "PNDM": PNDMScheduler,
132
  "UniPC 2M": UniPCMultistepScheduler,
133
  },
134
+ ANNOTATORS={
135
+ "canny": "lllyasviel/control_v11p_sd15_canny",
136
+ },
137
  EMBEDDING="fast_negative",
138
  EMBEDDINGS=[
139
  "cyberrealistic_negative",
lib/inference.py CHANGED
@@ -98,7 +98,7 @@ def generate(
98
  negative_prompt="",
99
  image_prompt=None,
100
  ip_image_prompt=None,
101
- ip_face=False,
102
  lora_1=None,
103
  lora_1_weight=0.0,
104
  lora_2=None,
@@ -108,6 +108,7 @@ def generate(
108
  seed=None,
109
  model="Lykon/dreamshaper-8",
110
  scheduler="DDIM",
 
111
  width=512,
112
  height=512,
113
  guidance_scale=7.5,
@@ -120,6 +121,7 @@ def generate(
120
  taesd=False,
121
  freeu=False,
122
  clip_skip=False,
 
123
  Error=Exception,
124
  Info=None,
125
  progress=None,
@@ -142,6 +144,10 @@ def generate(
142
  CURRENT_IMAGE = 1
143
 
144
  KIND = "img2img" if image_prompt is not None else "txt2img"
 
 
 
 
145
 
146
  EMBEDDINGS_TYPE = (
147
  ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NORMALIZED
@@ -174,6 +180,7 @@ def generate(
174
  IP_ADAPTER,
175
  model,
176
  scheduler,
 
177
  deepcache,
178
  scale,
179
  karras,
@@ -293,6 +300,13 @@ def generate(
293
  kwargs["strength"] = denoising_strength
294
  kwargs["image"] = prepare_image(image_prompt, (width, height))
295
 
 
 
 
 
 
 
 
296
  if IP_ADAPTER:
297
  # don't resize full-face images since they are usually square crops
298
  size = None if ip_face else (width, height)
 
98
  negative_prompt="",
99
  image_prompt=None,
100
  ip_image_prompt=None,
101
+ control_image_prompt=None,
102
  lora_1=None,
103
  lora_1_weight=0.0,
104
  lora_2=None,
 
108
  seed=None,
109
  model="Lykon/dreamshaper-8",
110
  scheduler="DDIM",
111
+ annotator="canny",
112
  width=512,
113
  height=512,
114
  guidance_scale=7.5,
 
121
  taesd=False,
122
  freeu=False,
123
  clip_skip=False,
124
+ ip_face=False,
125
  Error=Exception,
126
  Info=None,
127
  progress=None,
 
144
  CURRENT_IMAGE = 1
145
 
146
  KIND = "img2img" if image_prompt is not None else "txt2img"
147
+ KIND = f"controlnet_{KIND}" if control_image_prompt is not None else KIND
148
+
149
+ if KIND.startswith("controlnet_") and annotator.lower() not in Config.ANNOTATORS.keys():
150
+ raise Error(f"Invalid annotator: {annotator}")
151
 
152
  EMBEDDINGS_TYPE = (
153
  ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NORMALIZED
 
180
  IP_ADAPTER,
181
  model,
182
  scheduler,
183
+ annotator,
184
  deepcache,
185
  scale,
186
  karras,
 
300
  kwargs["strength"] = denoising_strength
301
  kwargs["image"] = prepare_image(image_prompt, (width, height))
302
 
303
+ if KIND == "controlnet_txt2img":
304
+ # don't resize controlnet images
305
+ kwargs["image"] = prepare_image(control_image_prompt, None)
306
+
307
+ if KIND == "controlnet_img2img":
308
+ kwargs["control_image"] = prepare_image(control_image_prompt, None)
309
+
310
  if IP_ADAPTER:
311
  # don't resize full-face images since they are usually square crops
312
  size = None if ip_face else (width, height)
lib/loader.py CHANGED
@@ -3,6 +3,7 @@ from threading import Lock
3
 
4
  import torch
5
  from DeepCache import DeepCacheSDHelper
 
6
  from diffusers.models import AutoencoderKL, AutoencoderTiny
7
  from diffusers.models.attention_processor import AttnProcessor2_0, IPAdapterAttnProcessor2_0
8
 
@@ -23,6 +24,7 @@ class Loader:
23
  cls._instance.pipe = None
24
  cls._instance.model = None
25
  cls._instance.upscaler = None
 
26
  cls._instance.ip_adapter = None
27
  cls._instance.log = Logger("Loader")
28
  return cls._instance
@@ -75,15 +77,36 @@ class Loader:
75
  return True
76
  return False
77
 
78
- def _should_unload_pipeline(self, kind="", model=""):
 
 
 
 
 
 
 
 
 
79
  if self.pipe is None:
80
  return False
81
  if self.model.lower() != model.lower():
82
  return True
83
  if kind == "txt2img" and not isinstance(self.pipe, Config.PIPELINES["txt2img"]):
84
- return True # txt2img -> img2img
85
  if kind == "img2img" and not isinstance(self.pipe, Config.PIPELINES["img2img"]):
86
- return True # img2img -> txt2img
 
 
 
 
 
 
 
 
 
 
 
 
87
  return False
88
 
89
  def _unload_upscaler(self):
@@ -128,7 +151,16 @@ class Loader:
128
  with timer(f"Unloading {self.model}", logger=self.log.info):
129
  self.pipe.to("cpu")
130
 
131
- def _unload(self, kind="", model="", ip_adapter="", deepcache=1, scale=1, freeu=False):
 
 
 
 
 
 
 
 
 
132
  to_unload = []
133
  if self._should_unload_deepcache(deepcache): # remove deepcache first
134
  self._unload_deepcache()
@@ -144,7 +176,10 @@ class Loader:
144
  self._unload_ip_adapter()
145
  to_unload.append("ip_adapter")
146
 
147
- if self._should_unload_pipeline(kind, model):
 
 
 
148
  self._unload_pipeline()
149
  to_unload.append("model")
150
  to_unload.append("pipe")
@@ -288,6 +323,7 @@ class Loader:
288
  ip_adapter,
289
  model,
290
  scheduler,
 
291
  deepcache,
292
  scale,
293
  karras,
@@ -336,7 +372,15 @@ class Loader:
336
  # defaults to float32
337
  pipe_kwargs["torch_dtype"] = torch.float16
338
 
339
- self._unload(kind, model, ip_adapter, deepcache, scale, freeu)
 
 
 
 
 
 
 
 
340
  self._load_pipeline(kind, model, progress, **pipe_kwargs)
341
 
342
  # error loading model
 
3
 
4
  import torch
5
  from DeepCache import DeepCacheSDHelper
6
+ from diffusers import ControlNetModel
7
  from diffusers.models import AutoencoderKL, AutoencoderTiny
8
  from diffusers.models.attention_processor import AttnProcessor2_0, IPAdapterAttnProcessor2_0
9
 
 
24
  cls._instance.pipe = None
25
  cls._instance.model = None
26
  cls._instance.upscaler = None
27
+ cls._instance.controlnet = None
28
  cls._instance.ip_adapter = None
29
  cls._instance.log = Logger("Loader")
30
  return cls._instance
 
77
  return True
78
  return False
79
 
80
+ def _should_unload_controlnet(self, kind="", controlnet=""):
81
+ if self.controlnet is None:
82
+ return False
83
+ if self.controlnet.lower() != controlnet.lower():
84
+ return True
85
+ if not kind.startswith("controlnet_"):
86
+ return True
87
+ return False
88
+
89
+ def _should_unload_pipeline(self, kind="", model="", controlnet=""):
90
  if self.pipe is None:
91
  return False
92
  if self.model.lower() != model.lower():
93
  return True
94
  if kind == "txt2img" and not isinstance(self.pipe, Config.PIPELINES["txt2img"]):
95
+ return True
96
  if kind == "img2img" and not isinstance(self.pipe, Config.PIPELINES["img2img"]):
97
+ return True
98
+ if kind == "controlnet_txt2img" and not isinstance(
99
+ self.pipe,
100
+ Config.PIPELINES["controlnet_txt2img"],
101
+ ):
102
+ return True
103
+ if kind == "controlnet_img2img" and not isinstance(
104
+ self.pipe,
105
+ Config.PIPELINES["controlnet_img2img"],
106
+ ):
107
+ return True
108
+ if self._should_unload_controlnet(kind, controlnet):
109
+ return True
110
  return False
111
 
112
  def _unload_upscaler(self):
 
151
  with timer(f"Unloading {self.model}", logger=self.log.info):
152
  self.pipe.to("cpu")
153
 
154
+ def _unload(
155
+ self,
156
+ kind="",
157
+ model="",
158
+ controlnet="",
159
+ ip_adapter="",
160
+ deepcache=1,
161
+ scale=1,
162
+ freeu=False,
163
+ ):
164
  to_unload = []
165
  if self._should_unload_deepcache(deepcache): # remove deepcache first
166
  self._unload_deepcache()
 
176
  self._unload_ip_adapter()
177
  to_unload.append("ip_adapter")
178
 
179
+ if self._should_unload_controlnet(kind, controlnet):
180
+ to_unload.append("controlnet")
181
+
182
+ if self._should_unload_pipeline(kind, model, controlnet):
183
  self._unload_pipeline()
184
  to_unload.append("model")
185
  to_unload.append("pipe")
 
323
  ip_adapter,
324
  model,
325
  scheduler,
326
+ annotator,
327
  deepcache,
328
  scale,
329
  karras,
 
372
  # defaults to float32
373
  pipe_kwargs["torch_dtype"] = torch.float16
374
 
375
+ if kind.startswith("controlnet_"):
376
+ pipe_kwargs["controlnet"] = ControlNetModel.from_pretrained(
377
+ Config.ANNOTATORS[annotator],
378
+ torch_dtype=torch.float16,
379
+ variant="fp16",
380
+ )
381
+ self.controlnet = annotator
382
+
383
+ self._unload(kind, model, annotator, ip_adapter, deepcache, scale, freeu)
384
  self._load_pipeline(kind, model, progress, **pipe_kwargs)
385
 
386
  # error loading model
lib/pipelines.py CHANGED
@@ -1,7 +1,12 @@
1
  import os
2
  from importlib import import_module
3
 
4
- from diffusers import StableDiffusionImg2ImgPipeline, StableDiffusionPipeline
 
 
 
 
 
5
  from diffusers.loaders.single_file import (
6
  SINGLE_FILE_OPTIONAL_COMPONENTS,
7
  load_single_file_sub_model,
@@ -220,3 +225,17 @@ class CustomStableDiffusionPipeline(CustomDiffusionMixin, StableDiffusionPipelin
220
 
221
  class CustomStableDiffusionImg2ImgPipeline(CustomDiffusionMixin, StableDiffusionImg2ImgPipeline):
222
  pass
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import os
2
  from importlib import import_module
3
 
4
+ from diffusers import (
5
+ StableDiffusionControlNetImg2ImgPipeline,
6
+ StableDiffusionControlNetPipeline,
7
+ StableDiffusionImg2ImgPipeline,
8
+ StableDiffusionPipeline,
9
+ )
10
  from diffusers.loaders.single_file import (
11
  SINGLE_FILE_OPTIONAL_COMPONENTS,
12
  load_single_file_sub_model,
 
225
 
226
  class CustomStableDiffusionImg2ImgPipeline(CustomDiffusionMixin, StableDiffusionImg2ImgPipeline):
227
  pass
228
+
229
+
230
+ class CustomStableDiffusionControlNetPipeline(
231
+ CustomDiffusionMixin,
232
+ StableDiffusionControlNetPipeline,
233
+ ):
234
+ pass
235
+
236
+
237
+ class CustomStableDiffusionControlNetImg2ImgPipeline(
238
+ CustomDiffusionMixin,
239
+ StableDiffusionControlNetImg2ImgPipeline,
240
+ ):
241
+ pass
lib/utils.py CHANGED
@@ -7,11 +7,14 @@ from contextlib import contextmanager
7
  from typing import Callable, TypeVar
8
 
9
  import anyio
 
10
  import httpx
 
11
  from anyio import Semaphore
12
  from diffusers.utils import logging as diffusers_logging
13
  from huggingface_hub._snapshot_download import snapshot_download
14
  from huggingface_hub.utils import are_progress_bars_disabled
 
15
  from transformers import logging as transformers_logging
16
  from typing_extensions import ParamSpec
17
 
@@ -107,6 +110,63 @@ def download_civit_file(lora_id, version_id, file_path=".", token=None):
107
  log.error(f"RequestError: {e}")
108
 
109
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  # like the original but supports args and kwargs instead of a dict
111
  # https://github.com/huggingface/huggingface-inference-toolkit/blob/0.2.0/src/huggingface_inference_toolkit/async_utils.py
112
  async def async_call(fn: Callable[P, T], *args: P.args, **kwargs: P.kwargs) -> T:
 
7
  from typing import Callable, TypeVar
8
 
9
  import anyio
10
+ import cv2
11
  import httpx
12
+ import numpy as np
13
  from anyio import Semaphore
14
  from diffusers.utils import logging as diffusers_logging
15
  from huggingface_hub._snapshot_download import snapshot_download
16
  from huggingface_hub.utils import are_progress_bars_disabled
17
+ from PIL import Image
18
  from transformers import logging as transformers_logging
19
  from typing_extensions import ParamSpec
20
 
 
110
  log.error(f"RequestError: {e}")
111
 
112
 
113
+ # resize an image while preserving the aspect ratio (size is width-first)
114
+ def resize_image(image, size):
115
+ if isinstance(image, Image.Image):
116
+ image = np.array(image)
117
+
118
+ H, W, _ = image.shape
119
+ W = float(W)
120
+ H = float(H)
121
+ target_W, target_H = size
122
+
123
+ # Use the smaller scaling factor to maintain the aspect ratio.
124
+ k_w = float(target_W) / W
125
+ k_h = float(target_H) / H
126
+ k = min(k_w, k_h)
127
+
128
+ new_W = int(np.round(W * k / 64.0)) * 64
129
+ new_H = int(np.round(H * k / 64.0)) * 64
130
+ img = cv2.resize(
131
+ image,
132
+ (new_W, new_H),
133
+ interpolation=cv2.INTER_LANCZOS4 if k > 1 else cv2.INTER_AREA,
134
+ )
135
+ return img
136
+
137
+
138
+ # ensure image is within bounds
139
+ def get_valid_size(image, step=64, low=512, high=4096):
140
+ def round_down(x, step=step):
141
+ return int((x // step) * step)
142
+
143
+ def clamp_range(x, low=low, high=high):
144
+ return max(low, min(x, high))
145
+
146
+ if isinstance(image, Image.Image):
147
+ image = np.array(image)
148
+
149
+ H, W = image.shape[:2]
150
+ ar = W / H
151
+
152
+ # try width first
153
+ if W > H:
154
+ new_W = round_down(clamp_range(W))
155
+ new_H = round_down(new_W / ar)
156
+ else:
157
+ new_H = round_down(clamp_range(H))
158
+ new_W = round_down(new_H * ar)
159
+
160
+ # if the new size is out of bounds, try the other dimension
161
+ if new_W < low or new_W > high:
162
+ new_W = round_down(clamp_range(W))
163
+ new_H = round_down(new_W / ar)
164
+ if new_H < low or new_H > high:
165
+ new_H = round_down(clamp_range(H))
166
+ new_W = round_down(new_H * ar)
167
+ return (new_W, new_H)
168
+
169
+
170
  # like the original but supports args and kwargs instead of a dict
171
  # https://github.com/huggingface/huggingface-inference-toolkit/blob/0.2.0/src/huggingface_inference_toolkit/async_utils.py
172
  async def async_call(fn: Callable[P, T], *args: P.args, **kwargs: P.kwargs) -> T:
requirements.txt CHANGED
@@ -1,5 +1,6 @@
1
  anyio==4.6.0
2
  compel==2.0.3
 
3
  deepcache==0.1.1
4
  diffusers==0.30.3
5
  einops==0.8.0
@@ -7,7 +8,9 @@ gradio==4.44.0
7
  h2
8
  hf-transfer
9
  httpx
 
10
  numpy==1.26.4
 
11
  peft
12
  ruff==0.6.7
13
  spaces==0.30.2
 
1
  anyio==4.6.0
2
  compel==2.0.3
3
+ controlnet-aux==0.0.9
4
  deepcache==0.1.1
5
  diffusers==0.30.3
6
  einops==0.8.0
 
8
  h2
9
  hf-transfer
10
  httpx
11
+ mediapipe
12
  numpy==1.26.4
13
+ opencv-contrib-python
14
  peft
15
  ruff==0.6.7
16
  spaces==0.30.2