LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 19 days ago • 107
InstructPix2Pix: Learning to Follow Image Editing Instructions Paper • 2211.09800 • Published Nov 17, 2022 • 3
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 179
view post Post 3065 NEW: Open Source Text/ Image to video model is out - MIT licensed - Rivals Gen-3, Pika & Kling 🔥> Pyramid Flow: Training-efficient Autoregressive Video Generation method> Utilizes Flow Matching> Trains on open-source datasets> Generates high-quality 10-second videos> Video resolution: 768p> Frame rate: 24 FPS> Supports image-to-video generation> Model checkpoints available on the hub 🤗: rain1011/pyramid-flow-sd3 👍 10 10 🔥 7 7 👀 3 3 + Reply