metadata
language:
- zh
- en
pipeline_tag: visual-question-answering
datasets:
- Lin-Chen/ShareGPT4V
- liuhaotian/LLaVA-Pretrain
Model
llava-qwen1.5-4b-chat is a lightweight multimodal models base on LLaVA architecture.
- Language Model: Qwen/Qwen1.5-4B-Chat
- Vision Encoder: google/siglip-so400m-patch14-384
- Total Paramters: 4,388,102,720
Evaluation
MMBench
Model | MMBench Test (EN) | MMBench Dev (EN) | MMBench Test (CN) | MMBench Dev (CN) | CCBench Dev |
---|---|---|---|---|---|
LLaVA-v1.5-7B | 67.7 | 69.2 | 61.0 | 59.7 | 28.4 |
LLaVA-InternLM-7B | 69.0 | 68.5 | 66.7 | 63.8 | 37.3 |
LLaVA-InternLM2-7B | 73.3 | 74.6 | 71.7 | 72.0 | 42.5 |
Bunny-3B | 69.2 | 68.6 | - | - | - |
MiniCPM-V | 64.1 | 67.9 | 62.6 | 65.3 | 41.4 |
llava-qwen1.5-4b-chat | 69.6 | 69.2 | 68.6 | 68.3 | 41.0 |
Uses
TBD
Training Details
TBD