|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- OpenGVLab/InternVL-Chat-V1-5 |
|
pipeline_tag: visual-question-answering |
|
--- |
|
|
|
## Citation |
|
|
|
If you use this finetuned model checkpoint in your research, please cite our paper as follows: |
|
|
|
```bibtex |
|
@misc{zhang2024visualquestiondecompositionmultimodal, |
|
title={Visual Question Decomposition on Multimodal Large Language Models}, |
|
author={Haowei Zhang and Jianzhe Liu and Zhen Han and Shuo Chen and Bailan He and Volker Tresp and Zhiqiang Xu and Jindong Gu}, |
|
year={2024}, |
|
eprint={2409.19339}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2409.19339}, |
|
} |
|
``` |