|
# Evaluation pipeline on MMBench |
|
|
|
## Intro to each data sample in MMBench |
|
|
|
MMBecnh is split into **dev** and **test** split, and each data sample in each split contains the following field: |
|
|
|
``` |
|
img: the raw data of an image |
|
question: the question |
|
options: the concated options |
|
category: the leaf category |
|
l2-category: the l2-level category |
|
options_dict: the dict contains all options |
|
index: the unique identifier of current question |
|
context (optional): the context to a question, which is optional. |
|
answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server) |
|
``` |
|
|
|
## Load MMBench |
|
|
|
We provide a code snippet as an example of loading MMBench |
|
|
|
```python |
|
import base64 |
|
import io |
|
import random |
|
|
|
import pandas as pd |
|
from PIL import Image |
|
from torch.utils.data import Dataset |
|
|
|
def decode_base64_to_image(base64_string): |
|
image_data = base64.b64decode(base64_string) |
|
image = Image.open(io.BytesIO(image_data)) |
|
return image |
|
|
|
class MMBenchDataset(Dataset): |
|
def __init__(self, |
|
data_file, |
|
sys_prompt='There are several options:'): |
|
self.df = pd.read_csv(data_file, sep='\t') |
|
self.sys_prompt = sys_prompt |
|
|
|
def __len__(self): |
|
return len(self.df) |
|
|
|
def __getitem__(self, idx): |
|
index = self.df.iloc[idx]['index'] |
|
image = self.df.iloc[idx]['image'] |
|
image = decode_base64_to_image(image) |
|
question = self.df.iloc[idx]['question'] |
|
answer = self.df.iloc[idx]['answer'] if 'answer' in self.df.iloc[0].keys() else None |
|
catetory = self.df.iloc[idx]['category'] |
|
l2_catetory = self.df.iloc[idx]['l2-category'] |
|
|
|
option_candidate = ['A', 'B', 'C', 'D', 'E'] |
|
options = { |
|
cand: self.load_from_df(idx, cand) |
|
for cand in option_candidate |
|
if self.load_from_df(idx, cand) is not None |
|
} |
|
options_prompt = f'{self.sys_prompt}\n' |
|
for key, item in options.items(): |
|
options_prompt += f'{key}. {item}\n' |
|
|
|
hint = self.load_from_df(idx, 'hint') |
|
data = { |
|
'img': image, |
|
'question': question, |
|
'answer': answer, |
|
'options': options_prompt, |
|
'category': catetory, |
|
'l2-category': l2_catetory, |
|
'options_dict': options, |
|
'index': index, |
|
'context': hint, |
|
} |
|
return data |
|
def load_from_df(self, idx, key): |
|
if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]): |
|
return self.df.iloc[idx][key] |
|
else: |
|
return None |
|
``` |
|
|
|
## How to construct the inference prompt |
|
|
|
```python |
|
if data_sample['context'] is not None: |
|
prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options'] |
|
else: |
|
prompt = data_sample['question'] + ' ' + data_sample['options'] |
|
``` |
|
|
|
For example: |
|
Question: Which category does this image belong to? |
|
A. Oil Painting |
|
B. Sketch |
|
C. Digital art |
|
D. Photo |
|
|
|
<div align=center> |
|
<img src="https://github-production-user-asset-6210df.s3.amazonaws.com/34324155/255581681-1364ef43-bd27-4eb5-b9e5-241327b1f920.png" width="50%"/> |
|
</div> |
|
|
|
```python |
|
prompt = """ |
|
###Human: Question: Which category does this image belong to? |
|
There are several options: A. Oil Painting, B. Sketch, C. Digital art, D. Photo |
|
###Assistant: |
|
""" |
|
``` |
|
|
|
You can make custom modifications to the prompt |
|
|
|
## How to save results: |
|
|
|
You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields: |
|
|
|
``` |
|
question: the question |
|
A: The first choice |
|
B: The second choice |
|
C: The third choice |
|
D: The fourth choice |
|
prediction: The prediction of your model to current question |
|
category: the leaf category |
|
l2_category: the l2-level category |
|
index: the question index |
|
``` |
|
|
|
If there are any questions with fewer than four options, simply leave those fields blank. |
|
|