Spaces:

adamelliotfields
/

diffusion

Running on Zero

App Files Files Community

adamelliotfields commited on Sep 29

Commit

98afd85

•

1 Parent(s): 9e8b99d

ControlNet

Browse files

Files changed (12) hide show

DOCS.md +28 -16
README.md +5 -8
app.css +5 -2
app.py +133 -81
lib/__init__.py +6 -0
lib/annotators.py +25 -0
lib/config.py +15 -2
lib/inference.py +15 -1
lib/loader.py +50 -6
lib/pipelines.py +20 -1
lib/utils.py +60 -0
requirements.txt +3 -0

DOCS.md CHANGED Viewed

@@ -1,8 +1,8 @@
-## Diffusion ZERO
 TL;DR: Enter a prompt or roll the `🎲` and press `Generate`.
-### Prompting
 Positive and negative prompts are embedded by [Compel](https://github.com/damian0815/compel) for weighting. See [syntax features](https://github.com/damian0815/compel/blob/main/doc/syntax.md) to learn more.
@@ -19,13 +19,13 @@ This is the same syntax used in [InvokeAI](https://invoke-ai.github.io/InvokeAI/
 | `(blue)1.2` | `(blue:1.2)`  |
 | `(blue)0.8` | `(blue:0.8)`  |
-#### Arrays
 Arrays allow you to generate multiple different images from a single prompt. For example, `an adult [[blonde,brunette]] [[man,woman]]` will expand into **4** different prompts. This implementation was inspired by [Fooocus](https://github.com/lllyasviel/Fooocus/pull/1503).
 > NB: Make sure to set `Images` to the number of images you want to generate. Otherwise, only the first prompt will be used.
-### Models
 Each model checkpoint has a different aesthetic:
@@ -38,7 +38,7 @@ Each model checkpoint has a different aesthetic:
 * [SG161222/Realistic_Vision_V5](https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE): realistic
 * [XpucT/Deliberate_v6](https://huggingface.co/XpucT/Deliberate): general purpose stylized
-### LoRA
 Apply up to 2 LoRA (low-rank adaptation) adapters with adjustable strength:
@@ -47,7 +47,7 @@ Apply up to 2 LoRA (low-rank adaptation) adapters with adjustable strength:
 > NB: The trigger words are automatically appended to the positive prompt for you.
-### Embeddings
 Select one or more [textual inversion](https://huggingface.co/docs/diffusers/en/using-diffusers/textual_inversion_inference) embeddings:
@@ -57,13 +57,13 @@ Select one or more [textual inversion](https://huggingface.co/docs/diffusers/en/
 > NB: The trigger token is automatically appended to the negative prompt for you.
-### Styles
 [Styles](https://huggingface.co/spaces/adamelliotfields/diffusion/blob/main/data/styles.json) are prompt templates that wrap your positive and negative prompts. They were originally derived from the [twri/sdxl_prompt_styler](https://github.com/twri/sdxl_prompt_styler) Comfy node, but have since been entirely rewritten.
 Start by framing a simple subject like `portrait of a young adult woman` or `landscape of a mountain range` and experiment.
-#### Anime
 The `Anime: *` styles work the best with Dreamshaper. When using the anime-specific Anything model, you should use the `Anime: Anything` style with the following settings:
@@ -73,13 +73,15 @@ The `Anime: *` styles work the best with Dreamshaper. When using the anime-speci
 You subject should be a few simple tokens like `girl, brunette, blue eyes, armor, nebula, celestial`. Experiment with `Clip Skip` and `Karras`. Finish with the `Perfection Style` LoRA on a moderate setting and upscale.
-### Scale
 Rescale up to 4x using [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) with weights from [ai-forever](ai-forever/Real-ESRGAN). Necessary for high-resolution images.
-### Image-to-Image
-The `🖼️ Image` tab enables the image-to-image and IP-Adapter pipelines. Either use the image input or select a generation from the gallery. To disable, simply clear the image input (the `x` overlay button).
 Denoising strength is essentially how much the generation will differ from the input image. A value of `0` will be identical to the original, while `1` will be a completely new image. You may want to also increase the number of inference steps. Only applies to the image-to-image input.
@@ -89,9 +91,19 @@ In an image-to-image pipeline, the input image is used as the initial latent. Wi
 For capturing faces, enable `IP-Adapter Face` to use the full-face model. You should use an input image that is mostly a face and it should be high quality. You can generate fake portraits with Realistic Vision to experiment. Note that you'll never get true identity preservation without an advanced pipeline like [InstantID](https://github.com/instantX-research/InstantID), which combines many techniques.
-### Advanced
-#### DeepCache
 [DeepCache](https://github.com/horseee/DeepCache) caches lower UNet layers and reuses them every `Interval` steps. Trade quality for speed:
 * `1`: no caching (default)
@@ -99,14 +111,14 @@ For capturing faces, enable `IP-Adapter Face` to use the full-face model. You sh
 * `3`: balanced
 * `4`: more speed
-#### FreeU
 [FreeU](https://github.com/ChenyangSi/FreeU) re-weights the contributions sourced from the UNet’s skip connections and backbone feature maps. Can sometimes improve image quality.
-#### Clip Skip
 When enabled, the last CLIP layer is skipped. Can sometimes improve image quality.
-#### Tiny VAE
 Enable [madebyollin/taesd](https://github.com/madebyollin/taesd) for near-instant latent decoding with a minor loss in detail. Useful for development.

+# Diffusion ZERO
 TL;DR: Enter a prompt or roll the `🎲` and press `Generate`.
+## Prompting
 Positive and negative prompts are embedded by [Compel](https://github.com/damian0815/compel) for weighting. See [syntax features](https://github.com/damian0815/compel/blob/main/doc/syntax.md) to learn more.
 | `(blue)1.2` | `(blue:1.2)`  |
 | `(blue)0.8` | `(blue:0.8)`  |
+### Arrays
 Arrays allow you to generate multiple different images from a single prompt. For example, `an adult [[blonde,brunette]] [[man,woman]]` will expand into **4** different prompts. This implementation was inspired by [Fooocus](https://github.com/lllyasviel/Fooocus/pull/1503).
 > NB: Make sure to set `Images` to the number of images you want to generate. Otherwise, only the first prompt will be used.
+## Models
 Each model checkpoint has a different aesthetic:
 * [SG161222/Realistic_Vision_V5](https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE): realistic
 * [XpucT/Deliberate_v6](https://huggingface.co/XpucT/Deliberate): general purpose stylized
+## LoRA
 Apply up to 2 LoRA (low-rank adaptation) adapters with adjustable strength:
 > NB: The trigger words are automatically appended to the positive prompt for you.
+## Embeddings
 Select one or more [textual inversion](https://huggingface.co/docs/diffusers/en/using-diffusers/textual_inversion_inference) embeddings:
 > NB: The trigger token is automatically appended to the negative prompt for you.
+## Styles
 [Styles](https://huggingface.co/spaces/adamelliotfields/diffusion/blob/main/data/styles.json) are prompt templates that wrap your positive and negative prompts. They were originally derived from the [twri/sdxl_prompt_styler](https://github.com/twri/sdxl_prompt_styler) Comfy node, but have since been entirely rewritten.
 Start by framing a simple subject like `portrait of a young adult woman` or `landscape of a mountain range` and experiment.
+### Anime
 The `Anime: *` styles work the best with Dreamshaper. When using the anime-specific Anything model, you should use the `Anime: Anything` style with the following settings:
 You subject should be a few simple tokens like `girl, brunette, blue eyes, armor, nebula, celestial`. Experiment with `Clip Skip` and `Karras`. Finish with the `Perfection Style` LoRA on a moderate setting and upscale.
+## Scale
 Rescale up to 4x using [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) with weights from [ai-forever](ai-forever/Real-ESRGAN). Necessary for high-resolution images.
+## Image-to-Image
+The `🖼️ Image` tab enables the image-to-image and IP-Adapter pipelines.
+### Strength
 Denoising strength is essentially how much the generation will differ from the input image. A value of `0` will be identical to the original, while `1` will be a completely new image. You may want to also increase the number of inference steps. Only applies to the image-to-image input.
 For capturing faces, enable `IP-Adapter Face` to use the full-face model. You should use an input image that is mostly a face and it should be high quality. You can generate fake portraits with Realistic Vision to experiment. Note that you'll never get true identity preservation without an advanced pipeline like [InstantID](https://github.com/instantX-research/InstantID), which combines many techniques.
+## ControlNet
+The `🎮 Control` tab enables the [ControlNet](https://github.com/lllyasviel/ControlNet) pipelines. Read the [Diffusers docs](https://huggingface.co/docs/diffusers/using-diffusers/controlnet) to learn more.
+### Annotators
+In ControlNet, the input image is a feature map produced by an _annotator_. These are computer vision models used for tasks like edge detection and pose estimation. ControlNet models are trained to understand these feature maps.
+> NB: Control images will be automatically resized to the nearest multiple of 64 (e.g., 513 -> 512).
+## Advanced
+### DeepCache
 [DeepCache](https://github.com/horseee/DeepCache) caches lower UNet layers and reuses them every `Interval` steps. Trade quality for speed:
 * `1`: no caching (default)
 * `3`: balanced
 * `4`: more speed
+### FreeU
 [FreeU](https://github.com/ChenyangSi/FreeU) re-weights the contributions sourced from the UNet’s skip connections and backbone feature maps. Can sometimes improve image quality.
+### Clip Skip
 When enabled, the last CLIP layer is skipped. Can sometimes improve image quality.
+### Tiny VAE
 Enable [madebyollin/taesd](https://github.com/madebyollin/taesd) for near-instant latent decoding with a minor loss in detail. Useful for development.

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ emoji: 🧨
 colorFrom: purple
 colorTo: blue
 sdk: gradio
-sdk_version: 4.41.0
 python_version: 3.11.9
 app_file: app.py
 fullWidth: false
@@ -25,9 +25,6 @@ models:
 - SG161222/Realistic_Vision_V5.1_noVAE
 - XpucT/Deliberate
 preload_from_hub:  # up to 10
-- >-
-  ai-forever/Real-ESRGAN
-  RealESRGAN_x2.pth,RealESRGAN_x4.pth
 - >-
   Comfy-Org/stable-diffusion-v1-5-archive
   v1-5-pruned-emaonly-fp16.safetensors
@@ -43,6 +40,9 @@ preload_from_hub:  # up to 10
 - >-
   Linaqruf/anything-v3-1
   anything-v3-2.safetensors
 - >-
   Lykon/dreamshaper-8
   feature_extractor/preprocessor_config.json,safety_checker/config.json,scheduler/scheduler_config.json,text_encoder/config.json,text_encoder/model.fp16.safetensors,tokenizer/merges.txt,tokenizer/special_tokens_map.json,tokenizer/tokenizer_config.json,tokenizer/vocab.json,unet/config.json,unet/diffusion_pytorch_model.fp16.safetensors,vae/config.json,vae/diffusion_pytorch_model.fp16.safetensors,model_index.json
@@ -62,6 +62,7 @@ preload_from_hub:  # up to 10
 Gradio app for Stable Diffusion 1.5 featuring:
 * txt2img and img2img pipelines with IP-Adapter
 * Curated models, LoRAs, and TI embeddings
 * Compel prompt weighting
 * dozens of styles and starter prompts
 * Multiple samplers with Karras scheduling
@@ -69,12 +70,8 @@ Gradio app for Stable Diffusion 1.5 featuring:
 * Real-ESRGAN upscaling
 * Optional tiny autoencoder
-There's also a [CLI](https://huggingface.co/spaces/adamelliotfields/diffusion/blob/main/cli.py).
 ## Motivation
-I want to:
 * host a free and easy-to-use Stable Diffusion UI on ZeroGPU
 * provide the necessary tools for common workflows
 * curate useful models, adapters, and embeddings

 colorFrom: purple
 colorTo: blue
 sdk: gradio
+sdk_version: 4.44.0
 python_version: 3.11.9
 app_file: app.py
 fullWidth: false
 - SG161222/Realistic_Vision_V5.1_noVAE
 - XpucT/Deliberate
 preload_from_hub:  # up to 10
 - >-
   Comfy-Org/stable-diffusion-v1-5-archive
   v1-5-pruned-emaonly-fp16.safetensors
 - >-
   Linaqruf/anything-v3-1
   anything-v3-2.safetensors
+- >-
+  lllyasviel/control_v11p_sd15_canny
+  diffusion_pytorch_model.fp16.safetensors
 - >-
   Lykon/dreamshaper-8
   feature_extractor/preprocessor_config.json,safety_checker/config.json,scheduler/scheduler_config.json,text_encoder/config.json,text_encoder/model.fp16.safetensors,tokenizer/merges.txt,tokenizer/special_tokens_map.json,tokenizer/tokenizer_config.json,tokenizer/vocab.json,unet/config.json,unet/diffusion_pytorch_model.fp16.safetensors,vae/config.json,vae/diffusion_pytorch_model.fp16.safetensors,model_index.json
 Gradio app for Stable Diffusion 1.5 featuring:
 * txt2img and img2img pipelines with IP-Adapter
 * Curated models, LoRAs, and TI embeddings
+* ControlNet with annotators
 * Compel prompt weighting
 * dozens of styles and starter prompts
 * Multiple samplers with Karras scheduling
 * Real-ESRGAN upscaling
 * Optional tiny autoencoder
 ## Motivation
 * host a free and easy-to-use Stable Diffusion UI on ZeroGPU
 * provide the necessary tools for common workflows
 * curate useful models, adapters, and embeddings

app.css CHANGED Viewed

@@ -30,7 +30,7 @@
   overflow-y: auto;
 }
 .gallery, .gallery .grid-wrap {
-  height: calc(100vh - 422px);
   max-height: none;
 }
@@ -108,7 +108,10 @@
   content: 'Random prompt';
 }
 .popover#clear:hover::after {
-  content: 'Clear gallery';
 }
 .popover#refresh:hover::after {
   content: var(--seed, "-1");

   overflow-y: auto;
 }
 .gallery, .gallery .grid-wrap {
+  height: calc(100vh - 430px);
   max-height: none;
 }
   content: 'Random prompt';
 }
 .popover#clear:hover::after {
+  content: 'Clear';
+}
+.popover#clear-control:hover::after {
+  content: 'Clear';
 }
 .popover#refresh:hover::after {
   content: var(--seed, "-1");

app.py CHANGED Viewed

@@ -6,13 +6,16 @@ import random
 import gradio as gr
 from lib import (
     Config,
     async_call,
     disable_progress_bars,
     download_civit_file,
     download_repo_files,
     generate,
     read_file,
 )
 # the CSS `content` attribute expects a string so we need to wrap the number in quotes
@@ -84,6 +87,15 @@ async def random_fn():
     return gr.Textbox(value=random.choice(prompts))
 async def generate_fn(*args, progress=gr.Progress(track_tqdm=True)):
     if len(args) > 0:
         prompt = args[0]
@@ -92,6 +104,7 @@ async def generate_fn(*args, progress=gr.Progress(track_tqdm=True)):
     if prompt is None or prompt.strip() == "":
         raise gr.Error("You must enter a prompt")
     DISABLE_IMAGE_PROMPT, DISABLE_IP_IMAGE_PROMPT = args[-2:]
     gen_args = list(args[:-2])
     if DISABLE_IMAGE_PROMPT:
@@ -148,25 +161,24 @@ with gr.Blocks(
     with gr.Tabs():
         with gr.TabItem("🏠 Text"):
             with gr.Column():
-                with gr.Group():
-                    output_images = gr.Gallery(
-                        elem_classes=["gallery"],
-                        show_share_button=False,
-                        object_fit="cover",
-                        interactive=False,
-                        show_label=False,
-                        label="Output",
-                        format="png",
-                        columns=2,
-                    )
-                    prompt = gr.Textbox(
-                        placeholder="What do you want to see?",
-                        autoscroll=False,
-                        show_label=False,
-                        label="Prompt",
-                        max_lines=3,
-                        lines=3,
-                    )
                 # Buttons
                 with gr.Row():
@@ -196,72 +208,104 @@ with gr.Blocks(
         # img2img tab
         with gr.TabItem("🖼️ Image"):
-            with gr.Group():
-                with gr.Row():
-                    image_prompt = gr.Image(
-                        show_share_button=False,
-                        label="Initial Image",
-                        min_width=320,
-                        format="png",
-                        type="pil",
-                    )
-                    ip_image_prompt = gr.Image(
-                        show_share_button=False,
-                        label="IP-Adapter Image",
-                        min_width=320,
-                        format="png",
-                        type="pil",
-                    )
-                with gr.Row():
-                    image_select = gr.Dropdown(
-                        info="Use an initial image from the gallery",
-                        choices=[("None", -1)],
-                        label="Gallery Image",
-                        interactive=True,
-                        filterable=False,
-                        value=-1,
-                    )
-                    ip_image_select = gr.Dropdown(
-                        info="Use an IP-Adapter image from the gallery",
-                        label="Gallery Image (IP-Adapter)",
-                        choices=[("None", -1)],
-                        interactive=True,
-                        filterable=False,
-                        value=-1,
-                    )
-                with gr.Row():
-                    denoising_strength = gr.Slider(
-                        value=Config.DENOISING_STRENGTH,
-                        label="Denoising Strength",
-                        minimum=0.0,
-                        maximum=1.0,
-                        step=0.1,
-                    )
-                with gr.Row():
-                    disable_image = gr.Checkbox(
-                        elem_classes=["checkbox"],
-                        label="Disable Initial Image",
-                        value=False,
-                    )
-                    disable_ip_image = gr.Checkbox(
-                        elem_classes=["checkbox"],
-                        label="Disable IP-Adapter Image",
-                        value=False,
-                    )
-                    ip_face = gr.Checkbox(
-                        elem_classes=["checkbox"],
-                        label="Use IP-Adapter Face",
-                        value=False,
-                    )
-        # img2img tab
         with gr.TabItem("🎮 Control"):
-            gr.Markdown(
-                "[ControlNet](https://github.com/lllyasviel/ControlNet) with [preprocessors](https://github.com/huggingface/controlnet_aux) coming soon!"
-            )
         with gr.TabItem("⚙️ Menu"):
             with gr.Group():
@@ -445,6 +489,12 @@ with gr.Blocks(
                         value=False,
                     )
     random_btn.click(random_fn, inputs=[], outputs=[prompt], show_api=False)
     refresh_btn.click(None, inputs=[], outputs=[seed], js=refresh_seed_js)
@@ -530,7 +580,7 @@ with gr.Blocks(
             negative_prompt,
             image_prompt,
             ip_image_prompt,
-            ip_face,
             lora_1,
             lora_1_weight,
             lora_2,
@@ -540,6 +590,7 @@ with gr.Blocks(
             seed,
             model,
             scheduler,
             width,
             height,
             guidance_scale,
@@ -552,6 +603,7 @@ with gr.Blocks(
             use_taesd,
             use_freeu,
             use_clip_skip,
             DISABLE_IMAGE_PROMPT,
             DISABLE_IP_IMAGE_PROMPT,
         ],

 import gradio as gr
 from lib import (
+    CannyAnnotator,
     Config,
     async_call,
     disable_progress_bars,
     download_civit_file,
     download_repo_files,
     generate,
+    get_valid_size,
     read_file,
+    resize_image,
 )
 # the CSS `content` attribute expects a string so we need to wrap the number in quotes
     return gr.Textbox(value=random.choice(prompts))
+# TODO: move this to another file once more annotators are added; will need @GPU decorator
+async def annotate_fn(image, annotator):
+    size = get_valid_size(image)
+    image = resize_image(image, size)
+    if annotator == "canny":
+        canny = CannyAnnotator()
+        return canny(image, size)
 async def generate_fn(*args, progress=gr.Progress(track_tqdm=True)):
     if len(args) > 0:
         prompt = args[0]
     if prompt is None or prompt.strip() == "":
         raise gr.Error("You must enter a prompt")
+    # always the last arguments
     DISABLE_IMAGE_PROMPT, DISABLE_IP_IMAGE_PROMPT = args[-2:]
     gen_args = list(args[:-2])
     if DISABLE_IMAGE_PROMPT:
     with gr.Tabs():
         with gr.TabItem("🏠 Text"):
             with gr.Column():
+                output_images = gr.Gallery(
+                    elem_classes=["gallery"],
+                    show_share_button=False,
+                    object_fit="cover",
+                    interactive=False,
+                    show_label=False,
+                    label="Output",
+                    format="png",
+                    columns=2,
+                )
+                prompt = gr.Textbox(
+                    placeholder="What do you want to see?",
+                    autoscroll=False,
+                    show_label=False,
+                    label="Prompt",
+                    max_lines=3,
+                    lines=3,
+                )
                 # Buttons
                 with gr.Row():
         # img2img tab
         with gr.TabItem("🖼️ Image"):
+            with gr.Row():
+                image_prompt = gr.Image(
+                    show_share_button=False,
+                    label="Initial Image",
+                    min_width=320,
+                    format="png",
+                    type="pil",
+                )
+                ip_image_prompt = gr.Image(
+                    show_share_button=False,
+                    label="IP-Adapter Image",
+                    min_width=320,
+                    format="png",
+                    type="pil",
+                )
+            with gr.Row():
+                image_select = gr.Dropdown(
+                    info="Use an initial image from the gallery",
+                    choices=[("None", -1)],
+                    label="Gallery Image",
+                    interactive=True,
+                    filterable=False,
+                    value=-1,
+                )
+                ip_image_select = gr.Dropdown(
+                    info="Use an IP-Adapter image from the gallery",
+                    label="Gallery Image",
+                    choices=[("None", -1)],
+                    interactive=True,
+                    filterable=False,
+                    value=-1,
+                )
+            with gr.Row():
+                denoising_strength = gr.Slider(
+                    value=Config.DENOISING_STRENGTH,
+                    label="Denoising Strength",
+                    minimum=0.0,
+                    maximum=1.0,
+                    step=0.1,
+                )
+            with gr.Row():
+                disable_image = gr.Checkbox(
+                    elem_classes=["checkbox"],
+                    label="Disable Initial Image",
+                    value=False,
+                )
+                disable_ip_image = gr.Checkbox(
+                    elem_classes=["checkbox"],
+                    label="Disable IP-Adapter Image",
+                    value=False,
+                )
+                use_ip_face = gr.Checkbox(
+                    elem_classes=["checkbox"],
+                    label="Use IP-Adapter Face",
+                    value=False,
+                )
+        # controlnet tab
         with gr.TabItem("🎮 Control"):
+            with gr.Row():
+                control_image_input = gr.Image(
+                    show_share_button=False,
+                    label="Control Image",
+                    min_width=320,
+                    format="png",
+                    type="pil",
+                )
+                control_image_prompt = gr.Image(
+                    interactive=False,
+                    show_share_button=False,
+                    label="Control Image Output",
+                    show_label=False,
+                    min_width=320,
+                    format="png",
+                    type="pil",
+                )
+            with gr.Row():
+                control_annotator = gr.Dropdown(
+                    choices=[("Canny", "canny")],
+                    label="Annotator",
+                    filterable=False,
+                    value="canny",
+                )
+            with gr.Row():
+                annotate_btn = gr.Button("Annotate", variant="primary")
+                clear_control_btn = gr.ClearButton(
+                    elem_classes=["icon-button", "popover"],
+                    components=[control_image_prompt],
+                    variant="secondary",
+                    elem_id="clear-control",
+                    min_width=0,
+                    value="🗑️",
+                )
         with gr.TabItem("⚙️ Menu"):
             with gr.Group():
                         value=False,
                     )
+    annotate_btn.click(
+        annotate_fn,
+        inputs=[control_image_input, control_annotator],
+        outputs=[control_image_prompt],
+    )
     random_btn.click(random_fn, inputs=[], outputs=[prompt], show_api=False)
     refresh_btn.click(None, inputs=[], outputs=[seed], js=refresh_seed_js)
             negative_prompt,
             image_prompt,
             ip_image_prompt,
+            control_image_prompt,
             lora_1,
             lora_1_weight,
             lora_2,
             seed,
             model,
             scheduler,
+            control_annotator,
             width,
             height,
             guidance_scale,
             use_taesd,
             use_freeu,
             use_clip_skip,
+            use_ip_face,
             DISABLE_IMAGE_PROMPT,
             DISABLE_IP_IMAGE_PROMPT,
         ],

lib/__init__.py CHANGED Viewed

@@ -1,3 +1,4 @@
 from .config import Config
 from .inference import generate
 from .loader import Loader
@@ -9,13 +10,16 @@ from .utils import (
     download_civit_file,
     download_repo_files,
     enable_progress_bars,
     load_json,
     read_file,
     safe_progress,
     timer,
 )
 __all__ = [
     "Config",
     "Loader",
     "Logger",
@@ -26,8 +30,10 @@ __all__ = [
     "download_repo_files",
     "enable_progress_bars",
     "generate",
     "load_json",
     "read_file",
     "safe_progress",
     "timer",
 ]

+from .annotators import CannyAnnotator
 from .config import Config
 from .inference import generate
 from .loader import Loader
     download_civit_file,
     download_repo_files,
     enable_progress_bars,
+    get_valid_size,
     load_json,
     read_file,
+    resize_image,
     safe_progress,
     timer,
 )
 __all__ = [
+    "CannyAnnotator",
     "Config",
     "Loader",
     "Logger",
     "download_repo_files",
     "enable_progress_bars",
     "generate",
+    "get_valid_size",
     "load_json",
     "read_file",
+    "resize_image",
     "safe_progress",
     "timer",
 ]

lib/annotators.py ADDED Viewed

	@@ -0,0 +1,25 @@

+from threading import Lock
+from controlnet_aux import CannyDetector
+class CannyAnnotator:
+    _instance = None
+    _lock = Lock()
+    def __new__(cls):
+        with cls._lock:
+            if cls._instance is None:
+                cls._instance = super().__new__(cls)
+                cls._instance.model = CannyDetector()
+        return cls._instance
+    def __call__(self, img, size):
+        resolution = min(*size)
+        return self.model(
+            img,
+            low_threshold=100,
+            high_threshold=200,
+            detect_resolution=resolution,
+            image_resolution=resolution,
+        )

lib/config.py CHANGED Viewed

@@ -16,7 +16,12 @@ from diffusers import (
 from diffusers.utils import logging as diffusers_logging
 from transformers import logging as transformers_logging
-from .pipelines import CustomStableDiffusionImg2ImgPipeline, CustomStableDiffusionPipeline
 # improved GPU handling and progress bars; set before importing spaces
 os.environ["ZEROGPU_V2"] = "1"
@@ -53,11 +58,14 @@ Config = SimpleNamespace(
     ZERO_GPU=import_module("spaces").config.Config.zero_gpu,
     HF_MODELS={
         # downloaded on startup
-        "Lykon/dreamshaper-8": [*_sd_files],
         "Comfy-Org/stable-diffusion-v1-5-archive": ["v1-5-pruned-emaonly-fp16.safetensors"],
         "cyberdelia/CyberRealistic": ["CyberRealistic_V5_FP16.safetensors"],
         "fluently/Fluently-v4": ["Fluently-v4.safetensors"],
         "Linaqruf/anything-v3-1": ["anything-v3-2.safetensors"],
         "prompthero/openjourney-v4": ["openjourney-v4.ckpt"],
         "SG161222/Realistic_Vision_V5.1_noVAE": ["Realistic_Vision_V5.1_fp16-no-ema.safetensors"],
         "XpucT/Deliberate": ["Deliberate_v6.safetensors"],
@@ -89,6 +97,8 @@ Config = SimpleNamespace(
     PIPELINES={
         "txt2img": CustomStableDiffusionPipeline,
         "img2img": CustomStableDiffusionImg2ImgPipeline,
     },
     MODEL="Lykon/dreamshaper-8",
     MODELS=[
@@ -121,6 +131,9 @@ Config = SimpleNamespace(
         "PNDM": PNDMScheduler,
         "UniPC 2M": UniPCMultistepScheduler,
     },
     EMBEDDING="fast_negative",
     EMBEDDINGS=[
         "cyberrealistic_negative",

 from diffusers.utils import logging as diffusers_logging
 from transformers import logging as transformers_logging
+from .pipelines import (
+    CustomStableDiffusionControlNetImg2ImgPipeline,
+    CustomStableDiffusionControlNetPipeline,
+    CustomStableDiffusionImg2ImgPipeline,
+    CustomStableDiffusionPipeline,
+)
 # improved GPU handling and progress bars; set before importing spaces
 os.environ["ZEROGPU_V2"] = "1"
     ZERO_GPU=import_module("spaces").config.Config.zero_gpu,
     HF_MODELS={
         # downloaded on startup
+        "ai-forever/Real-ESRGAN": ["RealESRGAN_x2.pth", "RealESRGAN_x4.pth"],
         "Comfy-Org/stable-diffusion-v1-5-archive": ["v1-5-pruned-emaonly-fp16.safetensors"],
         "cyberdelia/CyberRealistic": ["CyberRealistic_V5_FP16.safetensors"],
         "fluently/Fluently-v4": ["Fluently-v4.safetensors"],
         "Linaqruf/anything-v3-1": ["anything-v3-2.safetensors"],
+        "lllyasviel/control_v11p_sd15_canny": ["diffusion_pytorch_model.fp16.safetensors"],
+        "Lykon/dreamshaper-8": [*_sd_files],
+        "madebyollin/taesd": ["diffusion_pytorch_model.safetensors"],
         "prompthero/openjourney-v4": ["openjourney-v4.ckpt"],
         "SG161222/Realistic_Vision_V5.1_noVAE": ["Realistic_Vision_V5.1_fp16-no-ema.safetensors"],
         "XpucT/Deliberate": ["Deliberate_v6.safetensors"],
     PIPELINES={
         "txt2img": CustomStableDiffusionPipeline,
         "img2img": CustomStableDiffusionImg2ImgPipeline,
+        "controlnet_txt2img": CustomStableDiffusionControlNetPipeline,
+        "controlnet_img2img": CustomStableDiffusionControlNetImg2ImgPipeline,
     },
     MODEL="Lykon/dreamshaper-8",
     MODELS=[
         "PNDM": PNDMScheduler,
         "UniPC 2M": UniPCMultistepScheduler,
     },
+    ANNOTATORS={
+        "canny": "lllyasviel/control_v11p_sd15_canny",
+    },
     EMBEDDING="fast_negative",
     EMBEDDINGS=[
         "cyberrealistic_negative",

lib/inference.py CHANGED Viewed

@@ -98,7 +98,7 @@ def generate(
     negative_prompt="",
     image_prompt=None,
     ip_image_prompt=None,
-    ip_face=False,
     lora_1=None,
     lora_1_weight=0.0,
     lora_2=None,
@@ -108,6 +108,7 @@ def generate(
     seed=None,
     model="Lykon/dreamshaper-8",
     scheduler="DDIM",
     width=512,
     height=512,
     guidance_scale=7.5,
@@ -120,6 +121,7 @@ def generate(
     taesd=False,
     freeu=False,
     clip_skip=False,
     Error=Exception,
     Info=None,
     progress=None,
@@ -142,6 +144,10 @@ def generate(
     CURRENT_IMAGE = 1
     KIND = "img2img" if image_prompt is not None else "txt2img"
     EMBEDDINGS_TYPE = (
         ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NORMALIZED
@@ -174,6 +180,7 @@ def generate(
         IP_ADAPTER,
         model,
         scheduler,
         deepcache,
         scale,
         karras,
@@ -293,6 +300,13 @@ def generate(
             kwargs["strength"] = denoising_strength
             kwargs["image"] = prepare_image(image_prompt, (width, height))
         if IP_ADAPTER:
             # don't resize full-face images since they are usually square crops
             size = None if ip_face else (width, height)

     negative_prompt="",
     image_prompt=None,
     ip_image_prompt=None,
+    control_image_prompt=None,
     lora_1=None,
     lora_1_weight=0.0,
     lora_2=None,
     seed=None,
     model="Lykon/dreamshaper-8",
     scheduler="DDIM",
+    annotator="canny",
     width=512,
     height=512,
     guidance_scale=7.5,
     taesd=False,
     freeu=False,
     clip_skip=False,
+    ip_face=False,
     Error=Exception,
     Info=None,
     progress=None,
     CURRENT_IMAGE = 1
     KIND = "img2img" if image_prompt is not None else "txt2img"
+    KIND = f"controlnet_{KIND}" if control_image_prompt is not None else KIND
+    if KIND.startswith("controlnet_") and annotator.lower() not in Config.ANNOTATORS.keys():
+        raise Error(f"Invalid annotator: {annotator}")
     EMBEDDINGS_TYPE = (
         ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NORMALIZED
         IP_ADAPTER,
         model,
         scheduler,
+        annotator,
         deepcache,
         scale,
         karras,
             kwargs["strength"] = denoising_strength
             kwargs["image"] = prepare_image(image_prompt, (width, height))
+        if KIND == "controlnet_txt2img":
+            # don't resize controlnet images
+            kwargs["image"] = prepare_image(control_image_prompt, None)
+        if KIND == "controlnet_img2img":
+            kwargs["control_image"] = prepare_image(control_image_prompt, None)
         if IP_ADAPTER:
             # don't resize full-face images since they are usually square crops
             size = None if ip_face else (width, height)

lib/loader.py CHANGED Viewed

@@ -3,6 +3,7 @@ from threading import Lock
 import torch
 from DeepCache import DeepCacheSDHelper
 from diffusers.models import AutoencoderKL, AutoencoderTiny
 from diffusers.models.attention_processor import AttnProcessor2_0, IPAdapterAttnProcessor2_0
@@ -23,6 +24,7 @@ class Loader:
                 cls._instance.pipe = None
                 cls._instance.model = None
                 cls._instance.upscaler = None
                 cls._instance.ip_adapter = None
                 cls._instance.log = Logger("Loader")
         return cls._instance
@@ -75,15 +77,36 @@ class Loader:
             return True
         return False
-    def _should_unload_pipeline(self, kind="", model=""):
         if self.pipe is None:
             return False
         if self.model.lower() != model.lower():
             return True
         if kind == "txt2img" and not isinstance(self.pipe, Config.PIPELINES["txt2img"]):
-            return True  # txt2img -> img2img
         if kind == "img2img" and not isinstance(self.pipe, Config.PIPELINES["img2img"]):
-            return True  # img2img -> txt2img
         return False
     def _unload_upscaler(self):
@@ -128,7 +151,16 @@ class Loader:
             with timer(f"Unloading {self.model}", logger=self.log.info):
                 self.pipe.to("cpu")
-    def _unload(self, kind="", model="", ip_adapter="", deepcache=1, scale=1, freeu=False):
         to_unload = []
         if self._should_unload_deepcache(deepcache):  # remove deepcache first
             self._unload_deepcache()
@@ -144,7 +176,10 @@ class Loader:
             self._unload_ip_adapter()
             to_unload.append("ip_adapter")
-        if self._should_unload_pipeline(kind, model):
             self._unload_pipeline()
             to_unload.append("model")
             to_unload.append("pipe")
@@ -288,6 +323,7 @@ class Loader:
         ip_adapter,
         model,
         scheduler,
         deepcache,
         scale,
         karras,
@@ -336,7 +372,15 @@ class Loader:
             # defaults to float32
             pipe_kwargs["torch_dtype"] = torch.float16
-        self._unload(kind, model, ip_adapter, deepcache, scale, freeu)
         self._load_pipeline(kind, model, progress, **pipe_kwargs)
         # error loading model

 import torch
 from DeepCache import DeepCacheSDHelper
+from diffusers import ControlNetModel
 from diffusers.models import AutoencoderKL, AutoencoderTiny
 from diffusers.models.attention_processor import AttnProcessor2_0, IPAdapterAttnProcessor2_0
                 cls._instance.pipe = None
                 cls._instance.model = None
                 cls._instance.upscaler = None
+                cls._instance.controlnet = None
                 cls._instance.ip_adapter = None
                 cls._instance.log = Logger("Loader")
         return cls._instance
             return True
         return False
+    def _should_unload_controlnet(self, kind="", controlnet=""):
+        if self.controlnet is None:
+            return False
+        if self.controlnet.lower() != controlnet.lower():
+            return True
+        if not kind.startswith("controlnet_"):
+            return True
+        return False
+    def _should_unload_pipeline(self, kind="", model="", controlnet=""):
         if self.pipe is None:
             return False
         if self.model.lower() != model.lower():
             return True
         if kind == "txt2img" and not isinstance(self.pipe, Config.PIPELINES["txt2img"]):
+            return True
         if kind == "img2img" and not isinstance(self.pipe, Config.PIPELINES["img2img"]):
+            return True
+        if kind == "controlnet_txt2img" and not isinstance(
+            self.pipe,
+            Config.PIPELINES["controlnet_txt2img"],
+        ):
+            return True
+        if kind == "controlnet_img2img" and not isinstance(
+            self.pipe,
+            Config.PIPELINES["controlnet_img2img"],
+        ):
+            return True
+        if self._should_unload_controlnet(kind, controlnet):
+            return True
         return False
     def _unload_upscaler(self):
             with timer(f"Unloading {self.model}", logger=self.log.info):
                 self.pipe.to("cpu")
+    def _unload(
+        self,
+        kind="",
+        model="",
+        controlnet="",
+        ip_adapter="",
+        deepcache=1,
+        scale=1,
+        freeu=False,
+    ):
         to_unload = []
         if self._should_unload_deepcache(deepcache):  # remove deepcache first
             self._unload_deepcache()
             self._unload_ip_adapter()
             to_unload.append("ip_adapter")
+        if self._should_unload_controlnet(kind, controlnet):
+            to_unload.append("controlnet")
+        if self._should_unload_pipeline(kind, model, controlnet):
             self._unload_pipeline()
             to_unload.append("model")
             to_unload.append("pipe")
         ip_adapter,
         model,
         scheduler,
+        annotator,
         deepcache,
         scale,
         karras,
             # defaults to float32
             pipe_kwargs["torch_dtype"] = torch.float16
+        if kind.startswith("controlnet_"):
+            pipe_kwargs["controlnet"] = ControlNetModel.from_pretrained(
+                Config.ANNOTATORS[annotator],
+                torch_dtype=torch.float16,
+                variant="fp16",
+            )
+            self.controlnet = annotator
+        self._unload(kind, model, annotator, ip_adapter, deepcache, scale, freeu)
         self._load_pipeline(kind, model, progress, **pipe_kwargs)
         # error loading model

lib/pipelines.py CHANGED Viewed

@@ -1,7 +1,12 @@
 import os
 from importlib import import_module
-from diffusers import StableDiffusionImg2ImgPipeline, StableDiffusionPipeline
 from diffusers.loaders.single_file import (
     SINGLE_FILE_OPTIONAL_COMPONENTS,
     load_single_file_sub_model,
@@ -220,3 +225,17 @@ class CustomStableDiffusionPipeline(CustomDiffusionMixin, StableDiffusionPipelin
 class CustomStableDiffusionImg2ImgPipeline(CustomDiffusionMixin, StableDiffusionImg2ImgPipeline):
     pass

 import os
 from importlib import import_module
+from diffusers import (
+    StableDiffusionControlNetImg2ImgPipeline,
+    StableDiffusionControlNetPipeline,
+    StableDiffusionImg2ImgPipeline,
+    StableDiffusionPipeline,
+)
 from diffusers.loaders.single_file import (
     SINGLE_FILE_OPTIONAL_COMPONENTS,
     load_single_file_sub_model,
 class CustomStableDiffusionImg2ImgPipeline(CustomDiffusionMixin, StableDiffusionImg2ImgPipeline):
     pass
+class CustomStableDiffusionControlNetPipeline(
+    CustomDiffusionMixin,
+    StableDiffusionControlNetPipeline,
+):
+    pass
+class CustomStableDiffusionControlNetImg2ImgPipeline(
+    CustomDiffusionMixin,
+    StableDiffusionControlNetImg2ImgPipeline,
+):
+    pass

lib/utils.py CHANGED Viewed

@@ -7,11 +7,14 @@ from contextlib import contextmanager
 from typing import Callable, TypeVar
 import anyio
 import httpx
 from anyio import Semaphore
 from diffusers.utils import logging as diffusers_logging
 from huggingface_hub._snapshot_download import snapshot_download
 from huggingface_hub.utils import are_progress_bars_disabled
 from transformers import logging as transformers_logging
 from typing_extensions import ParamSpec
@@ -107,6 +110,63 @@ def download_civit_file(lora_id, version_id, file_path=".", token=None):
         log.error(f"RequestError: {e}")
 # like the original but supports args and kwargs instead of a dict
 # https://github.com/huggingface/huggingface-inference-toolkit/blob/0.2.0/src/huggingface_inference_toolkit/async_utils.py
 async def async_call(fn: Callable[P, T], *args: P.args, **kwargs: P.kwargs) -> T:

 from typing import Callable, TypeVar
 import anyio
+import cv2
 import httpx
+import numpy as np
 from anyio import Semaphore
 from diffusers.utils import logging as diffusers_logging
 from huggingface_hub._snapshot_download import snapshot_download
 from huggingface_hub.utils import are_progress_bars_disabled
+from PIL import Image
 from transformers import logging as transformers_logging
 from typing_extensions import ParamSpec
         log.error(f"RequestError: {e}")
+# resize an image while preserving the aspect ratio (size is width-first)
+def resize_image(image, size):
+    if isinstance(image, Image.Image):
+        image = np.array(image)
+    H, W, _ = image.shape
+    W = float(W)
+    H = float(H)
+    target_W, target_H = size
+    # Use the smaller scaling factor to maintain the aspect ratio.
+    k_w = float(target_W) / W
+    k_h = float(target_H) / H
+    k = min(k_w, k_h)
+    new_W = int(np.round(W * k / 64.0)) * 64
+    new_H = int(np.round(H * k / 64.0)) * 64
+    img = cv2.resize(
+        image,
+        (new_W, new_H),
+        interpolation=cv2.INTER_LANCZOS4 if k > 1 else cv2.INTER_AREA,
+    )
+    return img
+# ensure image is within bounds
+def get_valid_size(image, step=64, low=512, high=4096):
+    def round_down(x, step=step):
+        return int((x // step) * step)
+    def clamp_range(x, low=low, high=high):
+        return max(low, min(x, high))
+    if isinstance(image, Image.Image):
+        image = np.array(image)
+    H, W = image.shape[:2]
+    ar = W / H
+    # try width first
+    if W > H:
+        new_W = round_down(clamp_range(W))
+        new_H = round_down(new_W / ar)
+    else:
+        new_H = round_down(clamp_range(H))
+        new_W = round_down(new_H * ar)
+    # if the new size is out of bounds, try the other dimension
+    if new_W < low or new_W > high:
+        new_W = round_down(clamp_range(W))
+        new_H = round_down(new_W / ar)
+    if new_H < low or new_H > high:
+        new_H = round_down(clamp_range(H))
+        new_W = round_down(new_H * ar)
+    return (new_W, new_H)
 # like the original but supports args and kwargs instead of a dict
 # https://github.com/huggingface/huggingface-inference-toolkit/blob/0.2.0/src/huggingface_inference_toolkit/async_utils.py
 async def async_call(fn: Callable[P, T], *args: P.args, **kwargs: P.kwargs) -> T:

requirements.txt CHANGED Viewed

@@ -1,5 +1,6 @@
 anyio==4.6.0
 compel==2.0.3
 deepcache==0.1.1
 diffusers==0.30.3
 einops==0.8.0
@@ -7,7 +8,9 @@ gradio==4.44.0
 h2
 hf-transfer
 httpx
 numpy==1.26.4
 peft
 ruff==0.6.7
 spaces==0.30.2

 anyio==4.6.0
 compel==2.0.3
+controlnet-aux==0.0.9
 deepcache==0.1.1
 diffusers==0.30.3
 einops==0.8.0
 h2
 hf-transfer
 httpx
+mediapipe
 numpy==1.26.4
+opencv-contrib-python
 peft
 ruff==0.6.7
 spaces==0.30.2