Image editing with InstructPix2Pix

AI image editing models are traditionally focused on a single editing task such as style transfer or translation between image domains. InstructPix2Pix proposes a novel method for editing images using human instructions given an input image and a written text that tells the model what to do. The model follows these text-based instructions to edit the image.

This notebook demonstrates how to use the InstructPix2Pix model for image editing with OpenVINO.

The complete pipeline of this demo is shown below.

This is a demonstration in which you can type text-based instructions and provide an input image to the pipeline that will generate a new image, that reflects the context of the input text. Step-by-step the diffusion process will iteratively denoise the latent image representation while being conditioned on the text embeddings, provided by the text encoder and an original image encoded by a variational autoencoder.

The following image shows an example of the input image with text-based prompt and the corresponding edited image.

Notebook Contents

This notebook demonstrates how to convert and run stable diffusion using OpenVINO.

Notebook contains the following steps:

Convert PyTorch models to OpenVINO IR format, using Model Conversion API.
Run InstructPix2Pix pipeline with OpenVINO.
Optimize InstructPix2Pix pipeline with NNCF quantization.
Compare results of original and optimized pipelines.

Installation Instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.