instruct-pix2pix

Paused

App Files Files Community

hysts HF staff commited on Sep 29, 2023

Commit

0b34766

•

1 Parent(s): 5c5cb40

Apply formatter

Browse files

Files changed (6) hide show

.pre-commit-config.yaml +55 -0
.vscode/settings.json +21 -0
LICENSE +1 -1
README.md +8 -11
edit_app.py +15 -8
requirements.txt +2 -2

.pre-commit-config.yaml ADDED Viewed

	@@ -0,0 +1,55 @@

+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.4.0
+    hooks:
+      - id: check-executables-have-shebangs
+      - id: check-json
+      - id: check-merge-conflict
+      - id: check-shebang-scripts-are-executable
+      - id: check-toml
+      - id: check-yaml
+      - id: end-of-file-fixer
+      - id: mixed-line-ending
+        args: ["--fix=lf"]
+      - id: requirements-txt-fixer
+      - id: trailing-whitespace
+  - repo: https://github.com/myint/docformatter
+    rev: v1.7.5
+    hooks:
+      - id: docformatter
+        args: ["--in-place"]
+  - repo: https://github.com/pycqa/isort
+    rev: 5.12.0
+    hooks:
+      - id: isort
+        args: ["--profile", "black"]
+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.5.1
+    hooks:
+      - id: mypy
+        args: ["--ignore-missing-imports"]
+        additional_dependencies:
+          ["types-python-slugify", "types-requests", "types-PyYAML"]
+  - repo: https://github.com/psf/black
+    rev: 23.9.1
+    hooks:
+      - id: black
+        language_version: python3.10
+        args: ["--line-length", "119"]
+  - repo: https://github.com/kynan/nbstripout
+    rev: 0.6.1
+    hooks:
+      - id: nbstripout
+        args:
+          [
+            "--extra-keys",
+            "metadata.interpreter metadata.kernelspec cell.metadata.pycharm",
+          ]
+  - repo: https://github.com/nbQA-dev/nbQA
+    rev: 1.7.0
+    hooks:
+      - id: nbqa-black
+      - id: nbqa-pyupgrade
+        args: ["--py37-plus"]
+      - id: nbqa-isort
+        args: ["--float-to-top"]

.vscode/settings.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+    "[python]": {
+        "editor.defaultFormatter": "ms-python.black-formatter",
+        "editor.formatOnType": true,
+        "editor.codeActionsOnSave": {
+            "source.organizeImports": true
+        }
+    },
+    "black-formatter.args": [
+        "--line-length=119"
+    ],
+    "isort.args": ["--profile", "black"],
+    "flake8.args": [
+        "--max-line-length=119"
+    ],
+    "ruff.args": [
+        "--line-length=119"
+    ],
+    "editor.formatOnSave": true,
+    "files.insertFinalNewline": true
+}

LICENSE CHANGED Viewed

@@ -6,4 +6,4 @@ The above copyright notice and this permission notice shall be included in all c
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-Portions of code and models (such as pretrained checkpoints, which are fine-tuned starting from released Stable Diffusion checkpoints) are derived from the Stable Diffusion codebase (https://github.com/CompVis/stable-diffusion). Further restrictions may apply. Please consult the Stable Diffusion license `stable_diffusion/LICENSE`. Modified code is denoted as such in comments at the start of each file.


6
7	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8
9	+ Portions of code and models (such as pretrained checkpoints, which are fine-tuned starting from released Stable Diffusion checkpoints) are derived from the Stable Diffusion codebase (https://github.com/CompVis/stable-diffusion). Further restrictions may apply. Please consult the Stable Diffusion license `stable_diffusion/LICENSE`. Modified code is denoted as such in comments at the start of each file.

README.md CHANGED Viewed

@@ -10,16 +10,16 @@ pinned: false
 ### [Project Page](https://www.timothybrooks.com/instruct-pix2pix/) | [Paper](https://arxiv.org/abs/2211.09800) | [Data](http://instruct-pix2pix.eecs.berkeley.edu/)
 PyTorch implementation of InstructPix2Pix, an instruction-based image editing model, based on the original [CompVis/stable_diffusion](https://github.com/CompVis/stable-diffusion) repo. <br>
-[InstructPix2Pix: Learning to Follow Image Editing Instructions](https://www.timothybrooks.com/instruct-pix2pix/)
  [Tim Brooks](https://www.timothybrooks.com/)\*,
  [Aleksander Holynski](https://holynski.org/)\*,
  [Alexei A. Efros](https://people.eecs.berkeley.edu/~efros/) <br>
  UC Berkeley <br>
-  \*denotes equal contribution
   <img src='https://instruct-pix2pix.timothybrooks.com/teaser.jpg'/>
-## TL;DR: quickstart
 Set up a conda environment, and download a pretrained model:
 ```
@@ -38,7 +38,7 @@ python edit_cli.py --input imgs/example.jpg --output imgs/output.jpg --edit "tur
 Or launch your own interactive editing Gradio app:
 ```
-python edit_app.py
 ```
 ![Edit app](https://github.com/timothybrooks/instruct-pix2pix/blob/main/imgs/edit_app.jpg?raw=true)
@@ -80,9 +80,9 @@ InstructPix2Pix is trained by fine-tuning from an initial StableDiffusion checkp
 ```
 bash scripts/download_pretrained_sd.sh
 ```
-If you'd like to use a different checkpoint, point to it in the config file `configs/train.yaml`, on line 8, after `ckpt_path:`.
-Next, we need to change the config to point to our downloaded (or generated) dataset. If you're using the `clip-filtered-dataset` from above, you can skip this. Otherwise, you may need to edit lines 85 and 94 of the config (`data.params.train.params.path`, `data.params.validation.params.path`).
 Finally, start a training job with the following command:
@@ -101,7 +101,7 @@ We provide our generated dataset of captions and edit instructions [here](https:
 #### (1.1) Manually write a dataset of instructions and captions
-The first step of the process is fine-tuning GPT-3. To do this, we made a dataset of 700 examples broadly covering of edits that we might want our model to be able to perform. Our examples are available [here](https://instruct-pix2pix.eecs.berkeley.edu/human-written-prompts.jsonl). These should be diverse and cover a wide range of possible captions and types of edits. Ideally, they should avoid duplication or significant overlap of captions and instructions. It is also important to be mindful of limitations of Stable Diffusion and Prompt-to-Prompt in writing these examples, such as inability to perform large spatial transformations (e.g., moving the camera, zooming in, swapping object locations).
 Input prompts should closely match the distribution of input prompts used to generate the larger dataset. We sampled the 700 input prompts from the _LAION Improved Aesthetics 6.5+_ dataset and also use this dataset for generating examples. We found this dataset is quite noisy (many of the captions are overly long and contain irrelevant text). For this reason, we also considered MSCOCO and LAION-COCO datasets, but ultimately chose _LAION Improved Aesthetics 6.5+_ due to its diversity of content, proper nouns, and artistic mediums. If you choose to use another dataset or combination of datasets as input to GPT-3 when generating examples, we recommend you sample the input prompts from the same distribution when manually writing training examples.
@@ -211,6 +211,3 @@ If you're not getting the quality result you want, there may be a few reasons:
   year={2022}
 }
 ```

 ### [Project Page](https://www.timothybrooks.com/instruct-pix2pix/) | [Paper](https://arxiv.org/abs/2211.09800) | [Data](http://instruct-pix2pix.eecs.berkeley.edu/)
 PyTorch implementation of InstructPix2Pix, an instruction-based image editing model, based on the original [CompVis/stable_diffusion](https://github.com/CompVis/stable-diffusion) repo. <br>
+[InstructPix2Pix: Learning to Follow Image Editing Instructions](https://www.timothybrooks.com/instruct-pix2pix/)
  [Tim Brooks](https://www.timothybrooks.com/)\*,
  [Aleksander Holynski](https://holynski.org/)\*,
  [Alexei A. Efros](https://people.eecs.berkeley.edu/~efros/) <br>
  UC Berkeley <br>
+  \*denotes equal contribution
   <img src='https://instruct-pix2pix.timothybrooks.com/teaser.jpg'/>
+## TL;DR: quickstart
 Set up a conda environment, and download a pretrained model:
 ```
 Or launch your own interactive editing Gradio app:
 ```
+python edit_app.py
 ```
 ![Edit app](https://github.com/timothybrooks/instruct-pix2pix/blob/main/imgs/edit_app.jpg?raw=true)
 ```
 bash scripts/download_pretrained_sd.sh
 ```
+If you'd like to use a different checkpoint, point to it in the config file `configs/train.yaml`, on line 8, after `ckpt_path:`.
+Next, we need to change the config to point to our downloaded (or generated) dataset. If you're using the `clip-filtered-dataset` from above, you can skip this. Otherwise, you may need to edit lines 85 and 94 of the config (`data.params.train.params.path`, `data.params.validation.params.path`).
 Finally, start a training job with the following command:
 #### (1.1) Manually write a dataset of instructions and captions
+The first step of the process is fine-tuning GPT-3. To do this, we made a dataset of 700 examples broadly covering of edits that we might want our model to be able to perform. Our examples are available [here](https://instruct-pix2pix.eecs.berkeley.edu/human-written-prompts.jsonl). These should be diverse and cover a wide range of possible captions and types of edits. Ideally, they should avoid duplication or significant overlap of captions and instructions. It is also important to be mindful of limitations of Stable Diffusion and Prompt-to-Prompt in writing these examples, such as inability to perform large spatial transformations (e.g., moving the camera, zooming in, swapping object locations).
 Input prompts should closely match the distribution of input prompts used to generate the larger dataset. We sampled the 700 input prompts from the _LAION Improved Aesthetics 6.5+_ dataset and also use this dataset for generating examples. We found this dataset is quite noisy (many of the captions are overly long and contain irrelevant text). For this reason, we also considered MSCOCO and LAION-COCO datasets, but ultimately chose _LAION Improved Aesthetics 6.5+_ due to its diversity of content, proper nouns, and artistic mediums. If you choose to use another dataset or combination of datasets as input to GPT-3 when generating examples, we recommend you sample the input prompts from the same distribution when manually writing training examples.
   year={2022}
 }
 ```

edit_app.py CHANGED Viewed

@@ -5,9 +5,8 @@ import random
 import gradio as gr
 import torch
-from PIL import Image, ImageOps
 from diffusers import StableDiffusionInstructPix2PixPipeline
 help_text = """
 If you're not getting what you want, there may be a few reasons:
@@ -46,8 +45,11 @@ example_instructions = [
 model_id = "timbrooks/instruct-pix2pix"
 def main():
-    pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None).to("cuda")
     example_image = Image.open("imgs/example.jpg").convert("RGB")
     def load_example(
@@ -96,9 +98,12 @@ def main():
         generator = torch.manual_seed(seed)
         edited_image = pipe(
-            instruction, image=input_image,
-            guidance_scale=text_cfg_scale, image_guidance_scale=image_cfg_scale,
-            num_inference_steps=steps, generator=generator,
         ).images[0]
         return [seed, text_cfg_scale, image_cfg_scale, edited_image]
@@ -106,14 +111,16 @@ def main():
         return [0, "Randomize Seed", 1371, "Fix CFG", 7.5, 1.5, None]
     with gr.Blocks() as demo:
-        gr.HTML("""<h1 style="font-weight: 900; margin-bottom: 7px;">
    InstructPix2Pix: Learning to Follow Image Editing Instructions
 </h1>
 <p>For faster inference without waiting in queue, you may duplicate the space and upgrade to GPU in settings.
 <br/>
 <a href="https://huggingface.co/spaces/timbrooks/instruct-pix2pix?duplicate=true">
 <img style="margin-top: 0em; margin-bottom: 0em" src="https://bit.ly/3gLdBN6" alt="Duplicate Space"></a>
-<p/>""")
         with gr.Row():
             with gr.Column(scale=1, min_width=100):
                 generate_button = gr.Button("Generate")

 import gradio as gr
 import torch
 from diffusers import StableDiffusionInstructPix2PixPipeline
+from PIL import Image, ImageOps
 help_text = """
 If you're not getting what you want, there may be a few reasons:
 model_id = "timbrooks/instruct-pix2pix"
 def main():
+    pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(
+        model_id, torch_dtype=torch.float16, safety_checker=None
+    ).to("cuda")
     example_image = Image.open("imgs/example.jpg").convert("RGB")
     def load_example(
         generator = torch.manual_seed(seed)
         edited_image = pipe(
+            instruction,
+            image=input_image,
+            guidance_scale=text_cfg_scale,
+            image_guidance_scale=image_cfg_scale,
+            num_inference_steps=steps,
+            generator=generator,
         ).images[0]
         return [seed, text_cfg_scale, image_cfg_scale, edited_image]
         return [0, "Randomize Seed", 1371, "Fix CFG", 7.5, 1.5, None]
     with gr.Blocks() as demo:
+        gr.HTML(
+            """<h1 style="font-weight: 900; margin-bottom: 7px;">
    InstructPix2Pix: Learning to Follow Image Editing Instructions
 </h1>
 <p>For faster inference without waiting in queue, you may duplicate the space and upgrade to GPU in settings.
 <br/>
 <a href="https://huggingface.co/spaces/timbrooks/instruct-pix2pix?duplicate=true">
 <img style="margin-top: 0em; margin-bottom: 0em" src="https://bit.ly/3gLdBN6" alt="Duplicate Space"></a>
+<p/>"""
+        )
         with gr.Row():
             with gr.Column(scale=1, min_width=100):
                 generate_button = gr.Button("Generate")

requirements.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 -f --extra-index-url https://download.pytorch.org/whl/cu116
 torch
 torchvision
-numpy
 transformers
-git+https://github.com/huggingface/diffusers

 -f --extra-index-url https://download.pytorch.org/whl/cu116
+git+https://github.com/huggingface/diffusers
+numpy
 torch
 torchvision
 transformers