@qnguyen3 on Hugging Face: "🎉 Introducing nanoLLaVA, a powerful multimodal AI model that packs the…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

qnguyen3

posted an update Apr 10

Post

5145

🎉 Introducing nanoLLaVA, a powerful multimodal AI model that packs the capabilities of a 1B parameter vision language model into just 5GB of VRAM. 🚀 This makes it an ideal choice for edge devices, bringing cutting-edge visual understanding and generation to your devices like never before. 📱💻

Model: qnguyen3/nanoLLaVA 🔍
Spaces: qnguyen3/nanoLLaVA (thanks to @merve )

Under the hood, nanoLLaVA is based on the powerful vilm/Quyen-SE-v0.1 (my Qwen1.5-0.5B finetune) and Google's impressive google/siglip-so400m-patch14-384. 🧠 The model is trained using a data-centric approach to ensure optimal performance. 📊

In the spirit of transparency and collaboration, all code and model weights are open-sourced under the Apache 2.0 license. 🤝

ggcristian

Jun 24

Excited to read the final paper! I hope this model starts a trend where MM-LLMs are getting more efficient and portable. Exciting opportunities for researchers with not a lot of compute :)

In this post