Multi-image Inference

#10
by annabavaresco - opened

Hi, I was wondering if this version of Molmo supports multi-image inference and - if so - what's the correct way of processing the inputs. Thanks in advance!

It does not at the moment.

I see, thanks for your reply!

As a follow up, could you explain how you evaluate on MMMU? Doesn't it contain interleaved image-text data?

Sign up or log in to comment