Papers
arxiv:2407.03320

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Published on Jul 3
· Submitted by myownskyW7 on Jul 4
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in tasks requiring extensive input and output contexts. Compared to its previous 2.0 version, InternLM-XComposer-2.5 features three major upgrades in vision-language comprehension: (1) Ultra-High Resolution Understanding, (2) Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In addition to comprehension, IXC-2.5 extends to two compelling applications using extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2) Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks. The InternLM-XComposer-2.5 is publicly available at https://github.com/InternLM/InternLM-XComposer.

Community

Hi @myownskyW7 , congrats on your work 🔥 It would be great if you could link the demo to this paper, by adding arxiv.org/abs/2407.03320 to the README.

Congrats on this work! In addition to what @AdinaY said, the model could also be linked to this paper. See here for more info: https://huggingface.co/docs/hub/en/model-cards#linking-a-paper

Kudos @myownskyW7 and team. I've featured this paper in my AI research newsletter https://www.aitidbits.ai/p/july-4th-2024#:~:text=Vision-,Researchers,-open%20source%20InternLM

Looking forward to more novel papers and methods.

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.03320 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.03320 in a Space README.md to link it from this page.

Collections including this paper 13