Papers
arxiv:2407.06135

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

Published on Jul 8
· Submitted by ethanchern on Jul 9
Authors:
,

Abstract

Previous open-source large multimodal models (LMMs) have faced several limitations: (1) they often lack native integration, requiring adapters to align visual representations with pre-trained large language models (LLMs); (2) many are restricted to single-modal generation; (3) while some support multimodal generation, they rely on separate diffusion models for visual modeling and generation. To mitigate these limitations, we present Anole, an open, autoregressive, native large multimodal model for interleaved image-text generation. We build Anole from Meta AI's Chameleon, adopting an innovative fine-tuning strategy that is both data-efficient and parameter-efficient. Anole demonstrates high-quality, coherent multimodal generation capabilities. We have open-sourced our model, training framework, and instruction tuning data.

Community

Paper author Paper submitter
edited Jul 9

Abstract

Previous open-source large multimodal models (LMMs) have faced several limitations: (1) they often lack native integration, requiring adapters to align visual representations with pre-trained large language models (LLMs); (2) many are restricted to single-modal generation; (3) while some support multimodal generation, they rely on separate diffusion models for visual modeling and generation. To mitigate these limitations, we present Anole, an open, autoregressive, native large multimodal model for interleaved image-text generation. We build Anole from Meta AI's Chameleon, adopting an innovative fine-tuning strategy that is both data-efficient and parameter-efficient. Anole demonstrates high-quality, coherent multimodal generation capabilities. We have open-sourced our model, training framework, and instruction tuning data.

Homepage: https://gair-nlp.github.io/anole
Code: https://github.com/GAIR-NLP/anole
Hugging Face Model: https://huggingface.co/GAIR/Anole-7b-v0.1

·

Hi @ethanchern congrats on this work! Some remarks:

Let me know if you need any help!

This comment has been hidden

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.06135 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 8