Create evo_nishikie_v1.py

by yuki-imajuku - opened Jul 11

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+61

-169

Files changed (7) hide show

.gitattributes +0 -1
README.md +30 -65
config.json +0 -57
diffusion_pytorch_model.safetensors +0 -3
evo_nishikie_v1.py +31 -31
requirements.txt +0 -9
test.jpg +0 -3

.gitattributes CHANGED Viewed

@@ -33,4 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
-test.jpg filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -3,22 +3,26 @@ library_name: diffusers
 license: apache-2.0
 language:
 - ja
-pipeline_tag: image-to-image
 tags:
 - stable-diffusion
 ---
-# 🐟 Evo-Nishikie-v1
-🤗 [Models](https://huggingface.co/SakanaAI) | 📝 [Blog](https://sakana.ai/evo-ukiyoe/) | 🐦 [Twitter](https://twitter.com/SakanaAILabs)
-**Evo-Nishikie-v1** is an experimental education-purpose Ukiyoe colorization model. The model is a ControlNet trained with [Evo-Ukiyoe](https://huggingface.co/SakanaAI/Evo-Ukiyoe-v1/).
-The dataset used to train the model came from Ukiyoe images belonged to [Ritsumeikan University, Art Research Center](https://www.arc.ritsumei.ac.jp/). The sample data is belonged to the [Pre-Modern Japanese Text dataset](http://codh.rois.ac.jp/pmjt/),
-owned by [National Institute of Japanese Literature](https://www.nijl.ac.jp/en/) and curated by [ROIS-DS Center for Open Data in the Humanities](http://codh.rois.ac.jp/).
-Please refer to our [blog](https://sakana.ai/evo-ukiyoe/) for more details.
 ## Usage
@@ -31,53 +35,21 @@ Use the code below to get started with the model.
 1. Git clone this model card
    ```
-   git clone https://huggingface.co/SakanaAI/Evo-Nishikie-v1
-   ```
-2. Install git-lfs if you don't have it yet.
-   ```
-   sudo apt install git-lfs
-   git lfs install
    ```
-3. Create conda env
    ```
-   conda create -n evo-nishikie python=3.11
-   conda activate evo-nishikie
-   ```
-4. Install packages
-   ```
-   cd Evo-Nishikie-v1
    pip install -r requirements.txt
    ```
-5. Run
    ```python
-   import torch
-   from io import BytesIO
-   from PIL import Image
-   import requests
-   from evo_nishikie_v1 import load_evo_nishikie
-   #Get image from URL
-   #url = "https://huggingface.co/spaces/SakanaAI/Evo-Nishikie/resolve/main/sample2.jpg"
-   #original_image = Image.open(BytesIO(requests.get(url).content))
-   #Use local image
-   original_image = Image.open('test.jpg')
-   # Generate
-   device = "cuda"
-   pipe, processor = load_evo_nishikie(device)
-   images = pipe(
-        prompt="着物を着た女性が、赤ん坊を抱え、もう一人の子どもが手押し車を引いています。背景には木があります。最高品質の輻の浮世絵。超詳細。",
-        negative_prompt="暗い",
-        image=processor(original_image),
-        guidance_scale=7.0,
-        controlnet_conditioning_scale=0.8,
-        num_inference_steps=35,
-        num_images_per_prompt=1,
-        output_type="pil",
-   ).images
-   images[0].save("out.png")
    ```
 </details>
@@ -89,15 +61,14 @@ Use the code below to get started with the model.
 <!-- Provide a longer summary of what this model is. -->
 - **Developed by:** [Sakana AI](https://sakana.ai/)
-- **Model type:** Diffusion-based image-to-image generative model
 - **Language(s):** Japanese
-- **Blog:** https://sakana.ai/evo-ukiyoe
 ## License
 The Python script included in this repository is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
-Please note that the license for the model/pipeline generated by this script is inherited from the source models.
-The sample images used in the code has CC BY SA 4.0 license and is belonged to the Pre-Modern Japanese Text dataset, owned by National Institute of Japanese Literature and curated by ROIS-DS Center for Open Data in the Humanities.
 ## Uses
 This model is provided for research and development purposes only and should be considered as an experimental prototype.
@@ -109,17 +80,11 @@ Users must fully understand the risks associated with the use of this model and
 ## Acknowledgement
-Evo-Nishikie was trained based on Evo-Ukiyoe and Evo-Ukiyoe was trained based on Evo-SDXL-JP. We would like to thank the developers of Evo-SDXL-JP source models for their contributions and for making their work available.
-- [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
-- [Juggernaut-XL-v9](https://huggingface.co/RunDiffusion/Juggernaut-XL-v9)
-- [SDXL-DPO](https://huggingface.co/mhdang/dpo-sdxl-text2image-v1)
-- [JSDXL](https://huggingface.co/stabilityai/japanese-stable-diffusion-xl)
 ## Citation
-    @misc{Evo-Ukiyoe,
-    url    = {[https://huggingface.co/SakanaAI/Evo-Nishikie-v1](https://huggingface.co/SakanaAI/Evo-Nishikie-v1)},
-    title  = {Evo-Nishikie},
-    author = {Clanuwat, Tarin and Shing, Makoto and Imajuku, Yuki and Kitamoto, Asanobu and Akama, Ryo}
-    }

 license: apache-2.0
 language:
 - ja
+pipeline_tag: text-to-image
 tags:
 - stable-diffusion
 ---
+# 🐟 Evo-Ukiyoe-v1
+🤗 [Models](https://huggingface.co/SakanaAI) | 📝 [Blog](TODO) | 🐦 [Twitter](https://twitter.com/SakanaAILabs)
+**EvoSDXL-JP-v1** is an experimental education-purpose Japanese SDXL Lightning.
+This model was created using the Evolutionary Model Merge method.
+Please refer to our [report](https://arxiv.org/abs/2403.13187) and [blog](https://sakana.ai/evosdxl-jp/) for more details.
+This model was produced by merging the following models.
+We are grateful to the developers of the source models.
+- [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
+- [Juggernaut-XL-v9](https://huggingface.co/RunDiffusion/Juggernaut-XL-v9)
+- [SDXL-DPO](https://huggingface.co/mhdang/dpo-sdxl-text2image-v1)
+- [JSDXL](https://huggingface.co/stabilityai/japanese-stable-diffusion-xl)
+- [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning)
 ## Usage
 1. Git clone this model card
    ```
+   git clone https://huggingface.co/SakanaAI/EvoSDXL-JP-v1
    ```
+2. Install packages
    ```
+   cd EvoSDXL-JP-v1
    pip install -r requirements.txt
    ```
+3. Run
    ```python
+   from evosdxl_jp_v1 import load_evosdxl_jp
+   prompt = "柴犬"
+   pipe = load_evosdxl_jp(device="cuda")
+   images = pipe(prompt, num_inference_steps=4, guidance_scale=0).images
+   images[0].save("image.png")
    ```
 </details>
 <!-- Provide a longer summary of what this model is. -->
 - **Developed by:** [Sakana AI](https://sakana.ai/)
+- **Model type:** Diffusion-based text-to-image generative model
 - **Language(s):** Japanese
+- **Blog:** https://sakana.ai/TODO
 ## License
 The Python script included in this repository is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
+Please note that the license for the model/pipeline generated by this script is inherited from the source models.
 ## Uses
 This model is provided for research and development purposes only and should be considered as an experimental prototype.
 ## Acknowledgement
+We would like to thank the developers of the source models for their contributions and for making their work available.
 ## Citation
+```bibtex
+TODO
+```

config.json DELETED Viewed

@@ -1,57 +0,0 @@
-{
-  "_class_name": "ControlNetModel",
-  "_diffusers_version": "0.29.0",
-  "act_fn": "silu",
-  "addition_embed_type": "text_time",
-  "addition_embed_type_num_heads": 64,
-  "addition_time_embed_dim": 256,
-  "attention_head_dim": [
-    5,
-    10,
-    20
-  ],
-  "block_out_channels": [
-    320,
-    640,
-    1280
-  ],
-  "class_embed_type": null,
-  "conditioning_channels": 3,
-  "conditioning_embedding_out_channels": [
-    16,
-    32,
-    96,
-    256
-  ],
-  "controlnet_conditioning_channel_order": "rgb",
-  "cross_attention_dim": 2048,
-  "down_block_types": [
-    "DownBlock2D",
-    "CrossAttnDownBlock2D",
-    "CrossAttnDownBlock2D"
-  ],
-  "downsample_padding": 1,
-  "encoder_hid_dim": null,
-  "encoder_hid_dim_type": null,
-  "flip_sin_to_cos": true,
-  "freq_shift": 0,
-  "global_pool_conditions": false,
-  "in_channels": 4,
-  "layers_per_block": 2,
-  "mid_block_scale_factor": 1,
-  "mid_block_type": "UNetMidBlock2DCrossAttn",
-  "norm_eps": 1e-05,
-  "norm_num_groups": 32,
-  "num_attention_heads": null,
-  "num_class_embeds": null,
-  "only_cross_attention": false,
-  "projection_class_embeddings_input_dim": 2816,
-  "resnet_time_scale_shift": "default",
-  "transformer_layers_per_block": [
-    1,
-    2,
-    10
-  ],
-  "upcast_attention": null,
-  "use_linear_projection": true
-}

diffusion_pytorch_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:50513516f08fe6185d469218e69cc2d42ba59dc56eb315d50d096e2921ad6ce1
-size 5004167864

evo_nishikie_v1.py CHANGED Viewed

@@ -1,22 +1,23 @@
 import gc
 import os
-from typing import Dict, List, Tuple, Union
-from PIL import Image, ImageFilter
-from controlnet_aux import LineartDetector
 from diffusers import (
     ControlNetModel,
     StableDiffusionXLControlNetPipeline,
     UNet2DConditionModel,
 )
 from huggingface_hub import hf_hub_download
 import safetensors
 import torch
 from tqdm import tqdm
 from transformers import AutoTokenizer, CLIPTextModelWithProjection
-# Base models
 SDXL_REPO = "stabilityai/stable-diffusion-xl-base-1.0"
 DPO_REPO = "mhdang/dpo-sdxl-text2image-v1"
 JN_REPO = "RunDiffusion/Juggernaut-XL-v9"
@@ -29,18 +30,6 @@ UKIYOE_REPO = "SakanaAI/Evo-Ukiyoe-v1"
 NISHIKIE_REPO = "SakanaAI/Evo-Nishikie-v1"
-class EvoNishikieConditioningImageProcessor:
-    def __init__(self, device="cpu"):
-        self.lineart_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to(device)
-        self.image_filter = ImageFilter.MedianFilter(size=3)
-    def __call__(self, original_image: Image.Image) -> Image.Image:
-        lineart_image = self.lineart_detector(original_image, coarse=False, image_resolution=1024)
-        lineart_image_filtered = lineart_image.filter(self.image_filter)
-        conditioning_image = lineart_image_filtered.point(lambda p: 255 if p > 40 else 0).convert("L")
-        return conditioning_image
 def load_state_dict(checkpoint_file: Union[str, os.PathLike], device: str = "cpu"):
     file_extension = os.path.basename(checkpoint_file).split(".")[-1]
     if file_extension == "safetensors":
@@ -125,9 +114,7 @@ def split_conv_attn(weights):
     return {"conv": conv_tensors, "attn": attn_tensors}
-def load_evo_nishikie(device="cuda", processor_device="cpu") -> Tuple[
-    StableDiffusionXLControlNetPipeline, EvoNishikieConditioningImageProcessor
-]:
     # Load base models
     sdxl_weights = split_conv_attn(load_from_pretrained(SDXL_REPO, device=device))
     dpo_weights = split_conv_attn(
@@ -137,7 +124,6 @@ def load_evo_nishikie(device="cuda", processor_device="cpu") -> Tuple[
     )
     jn_weights = split_conv_attn(load_from_pretrained(JN_REPO, device=device))
     jsdxl_weights = split_conv_attn(load_from_pretrained(JSDXL_REPO, device=device))
     # Merge base models
     tensors = [sdxl_weights, dpo_weights, jn_weights, jsdxl_weights]
     new_conv = merge_models(
@@ -158,14 +144,11 @@ def load_evo_nishikie(device="cuda", processor_device="cpu") -> Tuple[
             0.2198623756106564,
         ],
     )
-    # Delete no longer needed variables to free
     del sdxl_weights, dpo_weights, jn_weights, jsdxl_weights
     gc.collect()
     if "cuda" in device:
         torch.cuda.empty_cache()
-    # Instantiate UNet
     unet_config = UNet2DConditionModel.load_config(SDXL_REPO, subfolder="unet")
     unet = UNet2DConditionModel.from_config(unet_config).to(device=device)
     unet.load_state_dict({**new_conv, **new_attn})
@@ -193,14 +176,31 @@ def load_evo_nishikie(device="cuda", processor_device="cpu") -> Tuple[
         torch_dtype=torch.float16,
         variant="fp16",
     )
     # Load Evo-Ukiyoe weights
     pipe.load_lora_weights(UKIYOE_REPO)
     pipe.fuse_lora(lora_scale=1.0)
-    pipe = pipe.to(device, dtype=torch.float16)
-    # Load conditioning image processor
-    processor = EvoNishikieConditioningImageProcessor(device=processor_device)
-    return pipe, processor

 import gc
+from io import BytesIO
 import os
+from typing import Dict, List, Union
+from PIL import Image
+from controlnet_aux import CannyDetector
 from diffusers import (
     ControlNetModel,
     StableDiffusionXLControlNetPipeline,
     UNet2DConditionModel,
 )
 from huggingface_hub import hf_hub_download
+import requests
 import safetensors
 import torch
 from tqdm import tqdm
 from transformers import AutoTokenizer, CLIPTextModelWithProjection
+# Base models (fine-tuned from SDXL-1.0)
 SDXL_REPO = "stabilityai/stable-diffusion-xl-base-1.0"
 DPO_REPO = "mhdang/dpo-sdxl-text2image-v1"
 JN_REPO = "RunDiffusion/Juggernaut-XL-v9"
 NISHIKIE_REPO = "SakanaAI/Evo-Nishikie-v1"
 def load_state_dict(checkpoint_file: Union[str, os.PathLike], device: str = "cpu"):
     file_extension = os.path.basename(checkpoint_file).split(".")[-1]
     if file_extension == "safetensors":
     return {"conv": conv_tensors, "attn": attn_tensors}
+def load_evo_nishikie(device="cuda") -> StableDiffusionXLControlNetPipeline:
     # Load base models
     sdxl_weights = split_conv_attn(load_from_pretrained(SDXL_REPO, device=device))
     dpo_weights = split_conv_attn(
     )
     jn_weights = split_conv_attn(load_from_pretrained(JN_REPO, device=device))
     jsdxl_weights = split_conv_attn(load_from_pretrained(JSDXL_REPO, device=device))
     # Merge base models
     tensors = [sdxl_weights, dpo_weights, jn_weights, jsdxl_weights]
     new_conv = merge_models(
             0.2198623756106564,
         ],
     )
     del sdxl_weights, dpo_weights, jn_weights, jsdxl_weights
     gc.collect()
     if "cuda" in device:
         torch.cuda.empty_cache()
     unet_config = UNet2DConditionModel.load_config(SDXL_REPO, subfolder="unet")
     unet = UNet2DConditionModel.from_config(unet_config).to(device=device)
     unet.load_state_dict({**new_conv, **new_attn})
         torch_dtype=torch.float16,
         variant="fp16",
     )
+    pipe = pipe.to(device, dtype=torch.float16)
     # Load Evo-Ukiyoe weights
     pipe.load_lora_weights(UKIYOE_REPO)
     pipe.fuse_lora(lora_scale=1.0)
+    return pipe
+if __name__ == "__main__":
+    url = "https://sakana.ai/assets/nedo-grant/nedo_grant.jpeg"
+    original_image = Image.open(
+        BytesIO(requests.get(url).content)
+    ).resize((1024, 1024), Image.Resampling.LANCZOS)
+    canny_detector = CannyDetector()
+    canny_image = canny_detector(original_image, image_resolution=1024)
+    pipe: StableDiffusionXLControlNetPipeline = load_evo_nishikie()
+    images = pipe(
+        prompt="銀杏が色づく。草木が生えた地面と青空の富士山。最高品質の輻の浮世絵。",
+        negative_prompt="暗い。",
+        image=canny_image,
+        guidance_scale=8.0,
+        controlnet_conditioning_scale=0.6,
+        num_inference_steps=50,
+        generator=torch.Generator().manual_seed(0),
+        num_images_per_prompt=1,
+        output_type="pil",
+    ).images
+    images[0].save("out.png")

requirements.txt DELETED Viewed

@@ -1,9 +0,0 @@
-torch
-torchvision
-accelerate==0.32.0
-controlnet-aux==0.0.9
-diffusers==0.29.2
-sentencepiece==0.2.0
-transformers==4.42.3
-peft==0.11.1

test.jpg DELETED Viewed

Git LFS Details

SHA256: 52803116a60fd53f8ed5f621f4296c66a3c761dce9589b6cdfb6acff6b269ab2
Pointer size: 132 Bytes
Size of remote file: 1.33 MB