Model Summery
MobileVLM V2 is a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs’ performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, MobileVLM_V2-3B model outperforms a large variety of VLMs at the 7B+ scale.
The MobileVLM_V2-7B was built on Vicuna-7B-v1.5 to facilitate the off-the-shelf deployment.
Model Sources
- Repository: https://github.com/Meituan-AutoML/MobileVLM
- Paper: MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
How to Get Started with the Model
Inference examples can be found at Github.
- Downloads last month
- 99
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.