arxiv:2305.14334

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

Published on May 23, 2023

Upvote

Authors:

Grace Luo ,

Lisa Dunlap ,

Abstract

Diffusion models have been shown to be capable of generating high-quality images, suggesting that they could contain meaningful internal representations. Unfortunately, the feature maps that encode a diffusion model's internal information are spread not only over layers of the network, but also over diffusion timesteps, making it challenging to extract useful descriptors. We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks. These descriptors can be extracted for both synthetic and real images using the generation and inversion processes. We evaluate the utility of our Diffusion Hyperfeatures on the task of semantic keypoint correspondence: our method achieves superior performance on the SPair-71k real image benchmark. We also demonstrate that our method is flexible and transferable: our feature aggregation network trained on the inversion features of real image pairs can be used on the generation features of synthetic image pairs with unseen objects and compositions. Our code is available at https://diffusion-hyperfeatures.github.io.

View arXiv page View PDF Add to collection

Community

TheProjectsGuy

Jul 19, 2023

Proposes Diffusion Hyperfeatures: A diffusion model’s internal features are spread over layers and diffusion timesteps, consolidate multi-scale multi-timestep feature maps (from diffusion models) into per-pixel feature descriptors for downstream tasks. A learnable and interpretable feature aggregation network learns the mixing weights (across diffusion model); inversion process for real images, diffusion process for synthetic images. Upsample feature maps to a fixed resolution using bottleneck layer and weight it with a mixing weight. Uses symmetric cross-entropy loss through cosine similarity for semantic correspondence. Higher percentage of correct keypoints within small neighborhood (compared to DHPF, DINO, and single layer diffusion features) on SPair-71k and CUB. Tries variants of stable diffusion model in supplementary material. From UC Berkeley, Google.

Links: arxiv, website, GitHub

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2305.14334 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2305.14334 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2305.14334 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.