arxiv:2301.08243

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Published on Jan 19, 2023

Authors:

Yann LeCun ,

Abstract

This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transformers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction.

View arXiv page View PDF Add to collection

Community

Aihey

Jun 22, 2023

This comment has been hidden

Aihey

Jun 22, 2023

This comment has been hidden

Aihey

Jun 23, 2023

This comment has been hidden

Aihey

Jun 23, 2023

Killer

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2301.08243 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2301.08243 in a Space README.md to link it from this page.

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Abstract

Community

Models citing this paper 4

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1