--- license: mit pipeline_tag: image-text-to-text language: - en base_model: - OpenGVLab/InternVL-Chat-V1-5 --- ## Citation If you use this finetuned model checkpoint in your research, please cite our paper as follows: ```bibtex @misc{zhang2024visualquestiondecompositionmultimodal, title={Visual Question Decomposition on Multimodal Large Language Models}, author={Haowei Zhang and Jianzhe Liu and Zhen Han and Shuo Chen and Bailan He and Volker Tresp and Zhiqiang Xu and Jindong Gu}, year={2024}, eprint={2409.19339}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2409.19339}, } ```