# Evaluation pipeline on MMBench ## Intro to each data sample in MMBench MMBecnh is split into **dev** and **test** split, and each data sample in each split contains the following field: ``` img: the raw data of an image question: the question options: the concated options category: the leaf category l2-category: the l2-level category options_dict: the dict contains all options index: the unique identifier of current question context (optional): the context to a question, which is optional. answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server) ``` ## Load MMBench We provide a code snippet as an example of loading MMBench ```python import base64 import io import random import pandas as pd from PIL import Image from torch.utils.data import Dataset def decode_base64_to_image(base64_string): image_data = base64.b64decode(base64_string) image = Image.open(io.BytesIO(image_data)) return image class MMBenchDataset(Dataset): def __init__(self, data_file, sys_prompt='There are several options:'): self.df = pd.read_csv(data_file, sep='\t') self.sys_prompt = sys_prompt def __len__(self): return len(self.df) def __getitem__(self, idx): index = self.df.iloc[idx]['index'] image = self.df.iloc[idx]['image'] image = decode_base64_to_image(image) question = self.df.iloc[idx]['question'] answer = self.df.iloc[idx]['answer'] if 'answer' in self.df.iloc[0].keys() else None catetory = self.df.iloc[idx]['category'] l2_catetory = self.df.iloc[idx]['l2-category'] option_candidate = ['A', 'B', 'C', 'D', 'E'] options = { cand: self.load_from_df(idx, cand) for cand in option_candidate if self.load_from_df(idx, cand) is not None } options_prompt = f'{self.sys_prompt}\n' for key, item in options.items(): options_prompt += f'{key}. {item}\n' hint = self.load_from_df(idx, 'hint') data = { 'img': image, 'question': question, 'answer': answer, 'options': options_prompt, 'category': catetory, 'l2-category': l2_catetory, 'options_dict': options, 'index': index, 'context': hint, } return data def load_from_df(self, idx, key): if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]): return self.df.iloc[idx][key] else: return None ``` ## How to construct the inference prompt ```python if data_sample['context'] is not None: prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options'] else: prompt = data_sample['question'] + ' ' + data_sample['options'] ``` For example: Question: Which category does this image belong to? A. Oil Painting B. Sketch C. Digital art D. Photo
```python prompt = """ ###Human: Question: Which category does this image belong to? There are several options: A. Oil Painting, B. Sketch, C. Digital art, D. Photo ###Assistant: """ ``` You can make custom modifications to the prompt ## How to save results: You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields: ``` question: the question A: The first choice B: The second choice C: The third choice D: The fourth choice prediction: The prediction of your model to current question category: the leaf category l2_category: the l2-level category index: the question index ``` If there are any questions with fewer than four options, simply leave those fields blank.