microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

Apr 15, 2023

•

edited Apr 15, 2023

Hello,

I see this from your paper in regards to implementing VQA and/or captioning:

'''
We utilize the METER (Dou et al., 2022) framework to facilitate our experiments on visual question answering (VQA). It formulates the VQA task as a classification task. The core module of METER is a transformer-based co-attention multimodal fusion module that produces cross-modal representations over the image and text encodings, which are then fed to a classifier for predicting the final answer.
'''

Is there some source code for this task that is available? Apologies, I'm new to the field so it's not quite intuitive to me.

YanboXu

Microsoft org Apr 17, 2023

Please refer to the METER package at https://github.com/zdou0830/METER.

shengz changed discussion status to closed Apr 20, 2023

microsoft
/

BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

How to try VQA