Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
HugoLaurencon 
posted an update Apr 22
Post
2462
Idefics2 is trained mostly on OBELICS, our open interleaved image-text document dataset.

Training on interleaved data is crucial to reaching high performance on VQA tasks, taking an arbitrary number of images as input, and doing in-context learning.

Dataset: HuggingFaceM4/OBELICS
Nomic visualization: https://atlas.nomic.ai/map/f2fba2aa-3647-4f49-a0f3-9347daeee499/ee4a84bd-f125-4bcc-a683-1b4e231cb10f
Link to OBELICS thread: https://twitter.com/HugoLaurencon/status/1694005892839006301
In this post