Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
Abstract
Diffusion models have been shown to be capable of generating high-quality images, suggesting that they could contain meaningful internal representations. Unfortunately, the feature maps that encode a diffusion model's internal information are spread not only over layers of the network, but also over diffusion timesteps, making it challenging to extract useful descriptors. We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks. These descriptors can be extracted for both synthetic and real images using the generation and inversion processes. We evaluate the utility of our Diffusion Hyperfeatures on the task of semantic keypoint correspondence: our method achieves superior performance on the SPair-71k real image benchmark. We also demonstrate that our method is flexible and transferable: our feature aggregation network trained on the inversion features of real image pairs can be used on the generation features of synthetic image pairs with unseen objects and compositions. Our code is available at https://diffusion-hyperfeatures.github.io.
Community
Proposes Diffusion Hyperfeatures: A diffusion model’s internal features are spread over layers and diffusion timesteps, consolidate multi-scale multi-timestep feature maps (from diffusion models) into per-pixel feature descriptors for downstream tasks. A learnable and interpretable feature aggregation network learns the mixing weights (across diffusion model); inversion process for real images, diffusion process for synthetic images. Upsample feature maps to a fixed resolution using bottleneck layer and weight it with a mixing weight. Uses symmetric cross-entropy loss through cosine similarity for semantic correspondence. Higher percentage of correct keypoints within small neighborhood (compared to DHPF, DINO, and single layer diffusion features) on SPair-71k and CUB. Tries variants of stable diffusion model in supplementary material. From UC Berkeley, Google.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper