How to try VQA
#4
by
seemorebricks
- opened
Hello,
I see this from your paper in regards to implementing VQA and/or captioning:
'''
We utilize the METER (Dou et al., 2022) framework to facilitate our experiments on visual question answering (VQA). It formulates the VQA task as a classification task. The core module of METER is a transformer-based co-attention multimodal fusion module that produces cross-modal representations over the image and text encodings, which are then fed to a classifier for predicting the final answer.
'''
Is there some source code for this task that is available? Apologies, I'm new to the field so it's not quite intuitive to me.
Please refer to the METER package at https://github.com/zdou0830/METER.
shengz
changed discussion status to
closed