CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding Paper • 2311.03354 • Published Nov 6, 2023 • 4
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework Paper • 2311.10125 • Published Nov 16, 2023 • 4
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Paper • 2311.10122 • Published Nov 16, 2023 • 26
Localized Symbolic Knowledge Distillation for Visual Commonsense Models Paper • 2312.04837 • Published Dec 8, 2023 • 2