Multi-image Inference
#10
by
annabavaresco
- opened
Hi, I was wondering if this version of Molmo supports multi-image inference and - if so - what's the correct way of processing the inputs. Thanks in advance!
It does not at the moment.
I see, thanks for your reply!
As a follow up, could you explain how you evaluate on MMMU? Doesn't it contain interleaved image-text data?