arxiv:2407.03320

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Published on Jul 3

· Submitted by

myownskyW7 on Jul 4

#1 Paper of the day

Upvote

Authors:

Yuhang Zang ,

Yuhang Cao ,

Rui Qian ,

Lin Chen ,

Qipeng Guo ,

Haodong Duan ,

Songyang Zhang ,

Wenwei Zhang ,

Jingwen Li ,

Wenhai Wang ,

Conghui He ,

Abstract

We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in tasks requiring extensive input and output contexts. Compared to its previous 2.0 version, InternLM-XComposer-2.5 features three major upgrades in vision-language comprehension: (1) Ultra-High Resolution Understanding, (2) Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In addition to comprehension, IXC-2.5 extends to two compelling applications using extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2) Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks. The InternLM-XComposer-2.5 is publicly available at https://github.com/InternLM/InternLM-XComposer.

View arXiv page View PDF Add to collection

Community

myownskyW7

Paper submitter 4 days ago

•

edited 4 days ago

Homepage: https://github.com/InternLM/InternLM-XComposer
Gradio Online Demo: https://huggingface.co/spaces/Willow123/InternLM-XComposer

AdinaY

4 days ago

Hi @myownskyW7 , congrats on your work 🔥 It would be great if you could link the demo to this paper, by adding arxiv.org/abs/2407.03320 to the README.

nielsr

4 days ago

Congrats on this work! In addition to what @AdinaY said, the model could also be linked to this paper. See here for more info: https://huggingface.co/docs/hub/en/model-cards#linking-a-paper