Papers
arxiv:2311.04257

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Published on Nov 7, 2023
· Submitted by akhaliq on Nov 9, 2023
#2 Paper of the day
Authors:
,
,
,
,

Abstract

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily focus on enhancing multi-modal capabilities. In this work, we introduce a versatile multi-modal large language model, mPLUG-Owl2, which effectively leverages modality collaboration to improve performance in both text and multi-modal tasks. mPLUG-Owl2 utilizes a modularized network design, with the language decoder acting as a universal interface for managing different modalities. Specifically, mPLUG-Owl2 incorporates shared functional modules to facilitate modality collaboration and introduces a modality-adaptive module that preserves modality-specific features. Extensive experiments reveal that mPLUG-Owl2 is capable of generalizing both text tasks and multi-modal tasks and achieving state-of-the-art performances with a single generic model. Notably, mPLUG-Owl2 is the first MLLM model that demonstrates the modality collaboration phenomenon in both pure-text and multi-modal scenarios, setting a pioneering path in the development of future multi-modal foundation models.

Community

Training, inference and evaluation code, plus trained models with a demo. If only all papers were released this complete.

Outstanding release! Been looking forward to this one!

Thank you team!

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2311.04257 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 9