ConvLLaVA Model Card

Model details

Model type: ConvLLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: lmsys/vicuna-7b-v1.5

Model date: ConvLLaVA-1024 was trained in March 2024.

Paper or resources for more information: https://github.com/alibaba/conv-llava/

License

Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

Where to send questions or comments about the model: https://github.com/alibaba/conv-llava/issues

Intended use

Primary intended uses: The primary use of ConvLLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

  • 1.2M ShareGPT4V-PT caption data.
  • 100K ShareGPT4V caption data.
  • 1.4M ALLaVA caption and instruction data.
  • 186K VFLAN multitask data.
  • 158K GPT-generated multimodal instruction-following data.
  • 500K academic-task-oriented VQA data mixture.
  • 40K ShareGPT data.

Paper

arxiv.org/abs/2405.15738

Downloads last month
13
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ConvLLaVA/ConvLLaVA-sft-1024

Collection including ConvLLaVA/ConvLLaVA-sft-1024