File size: 670 Bytes
cc93efa 5e1c155 a2a70d5 5e1c155 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
---
license: mit
language:
- en
base_model:
- OpenGVLab/InternVL-Chat-V1-5
pipeline_tag: visual-question-answering
---
## Citation
If you use this finetuned model checkpoint in your research, please cite our paper as follows:
```bibtex
@misc{zhang2024visualquestiondecompositionmultimodal,
title={Visual Question Decomposition on Multimodal Large Language Models},
author={Haowei Zhang and Jianzhe Liu and Zhen Han and Shuo Chen and Bailan He and Volker Tresp and Zhiqiang Xu and Jindong Gu},
year={2024},
eprint={2409.19339},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.19339},
}
``` |