File size: 4,178 Bytes
256a159 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
# Multi-modality Evaluation
We support several multi-modality datasets, such as [MMBench](https://opencompass.org.cn/MMBench), [SEED-Bench](https://github.com/AILab-CVC/SEED-Bench) to evaluate multi-modality models. Before starting, please make sure you have downloaded the evaluation datasets following the official instruction.
## Start Evaluation
Before evaluation, you could modify `tasks.py` or create a new file like `tasks.py` to evaluate your own model.
Generally to run the evaluation, we use command below.
### Slurm
```sh
cd $root
python run.py configs/multimodal/tasks.py --mm-eval --slurm -p $PARTITION
```
### PyTorch
```sh
cd $root
python run.py configs/multimodal/tasks.py --mm-eval
```
## Configuration File
We adapt the new config format of [MMEngine](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#a-pure-python-style-configuration-file-beta).
### Task File
Here is the example config of `configs/multimodal/tasks.py`.
```python
from mmengine.config import read_base
with read_base():
from .minigpt_4.minigpt_4_7b_mmbench import (minigpt_4_mmbench_dataloader,
minigpt_4_mmbench_evaluator,
minigpt_4_mmbench_load_from,
minigpt_4_mmbench_model)
models = [minigpt_4_mmbench_model]
datasets = [minigpt_4_mmbench_dataloader]
evaluators = [minigpt_4_mmbench_evaluator]
load_froms = [minigpt_4_mmbench_load_from]
# set the platform and resources
num_gpus = 8
num_procs = 8
launcher = 'pytorch'
```
### Details of Task
Here is an example of MiniGPT-4 with MMBench and we provide some comments for
users to understand the meaning of the keys in config.
```python
from opencompass.multimodal.models.minigpt_4 import (
MiniGPT4MMBenchPromptConstructor, MiniGPT4MMBenchPostProcessor)
# dataloader settings
# Here we use Transforms in MMPreTrain to process images
val_pipeline = [
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(type='mmpretrain.PackInputs',
algorithm_keys=[
'question', 'category', 'l2-category', 'context', 'index',
'options_dict', 'options', 'split'
])
]
# The defined MMBench datasets to load evaluation data
dataset = dict(type='opencompass.MMBenchDataset',
data_file='data/mmbench/mmbench_test_20230712.tsv',
pipeline=val_pipeline)
minigpt_4_mmbench_dataloader = dict(batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler',
shuffle=False))
# model settings
minigpt_4_mmbench_model = dict(
type='minigpt-4', # the test multomodal algorithm, the type can be found in `opencompass/multimodal/models/minigpt_4.py`, `@MM_MODELS.register_module('minigpt-4')`
low_resource=False,
llama_model='/path/to/vicuna-7b/', # the model path of LLM
prompt_constructor=dict(type=MiniGPT4MMBenchPromptConstructor, # the PromptConstructor to construct the prompt
image_prompt='###Human: <Img><ImageHere></Img>',
reply_prompt='###Assistant:'),
post_processor=dict(type=MiniGPT4MMBenchPostProcessor)) # the PostProcessor to deal with the output, process it into the required format
# evaluation settings
minigpt_4_mmbench_evaluator = [
dict(type='opencompass.DumpResults', # the evaluator will dump results to save_path, code can be found in `opencompass/metrics/dump_results.py`
save_path='work_dirs/minigpt-4-7b-mmbench.xlsx')
]
minigpt_4_mmbench_load_from = '/path/to/prerained_minigpt4_7b.pth' # the model path of linear layer between Q-Former and LLM in MiniGPT-4
```
|