arxiv:2310.08541

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

Published on Oct 12, 2023

· Submitted by

akhaliq on Oct 13, 2023

#3 Paper of the day

Upvote

Authors:

Zhengyuan Yang ,

Jianfeng Wang ,

Linjie Li ,

Kevin Lin ,

Chung-Ching Lin ,

Lijuan Wang

Abstract

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models via iterative explorations. This enables them to efficiently convert their high-level generation ideas into effective T2I prompts that can produce good images. We investigate if systems based on large multimodal models (LMMs) can develop analogous multimodal self-refinement abilities that enable exploring unknown models or environments via self-refining tries. Idea2Img cyclically generates revised T2I prompts to synthesize draft images, and provides directional feedback for prompt revision, both conditioned on its memory of the probed T2I model's characteristics. The iterative self-refinement brings Idea2Img various advantages over vanilla T2I models. Notably, Idea2Img can process input ideas with interleaved image-text sequences, follow ideas with design instructions, and generate images of better semantic and visual qualities. The user preference study validates the efficacy of multimodal iterative self-refinement on automatic image design and generation.

View arXiv page View PDF Add to collection

Community

librarian-bot

Oct 14, 2023

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

michaeluffer

Oct 18, 2023

Very interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?

zyang39

Paper author Oct 19, 2023

Very interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?

Thank you for your interest. We are preparing the code and will release it soon. Thanks.

itsinuxx

Jan 9

Do you have a release date? I'm dying to test this technology!

itsinuxx

Jan 9

Very interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?

Thank you for your interest. We are preparing the code and will release it soon. Thanks.

When are you going to release it?