view post Post 1574 Reply 🤗 transformers pipelines now support vision language models for easy local inference 🫰🏻 h/t @yonigozlan for shipping this 🎩👏you can also use inference API to infer hosted vision LMs (via Python, JS and cURL) https://huggingface.co/docs/api-inference/en/tasks/image-text-to-text
view post Post 4629 Reply OmniVision-968M: a new local VLM for edge devices, fast & small but performant💨 a new vision language model with 9x less image tokens, super efficient 📖 aligned with DPO for reducing hallucinations⚡️ Apache 2.0 license 🔥Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demoModel NexaAIDev/omnivision-968M
Nov 15 Releases 🍂 microsoft/LLM2CLIP-EVA02-L-14-336 Zero-Shot Image Classification • Updated 5 days ago • 1.01k • 36 microsoft/LLM2CLIP-EVA02-B-16 Updated 10 days ago • 238 • 6 PleIAs/common_corpus Viewer • Updated 6 days ago • 397M • 31.2k • 143 Qwen/Qwen2.5-Coder-32B-Instruct Text Generation • Updated 3 days ago • 49.9k • • 865
Nov 1 Releases Running on Zero 64 🌖 LongVU facebook/MobileLLM-1B Text Generation • Updated 20 days ago • 9.02k • 108 Vision-CAIR/LongVU_Qwen2_7B Video-Text-to-Text • Updated 22 days ago • 1.18k • 55 Vision-CAIR/LongVU_Llama3_2_3B_img Updated 29 days ago • 99 • 6