SILC: Improving Vision Language Pretraining with Self-Distillation Paper • 2310.13355 • Published Oct 20, 2023 • 7
Woodpecker: Hallucination Correction for Multimodal Large Language Models Paper • 2310.16045 • Published Oct 24, 2023 • 14
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Paper • 2201.12086 • Published Jan 28, 2022 • 3
ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories Paper • 2305.15028 • Published May 24, 2023 • 1
MM-VID: Advancing Video Understanding with GPT-4V(ision) Paper • 2310.19773 • Published Oct 30, 2023 • 19
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Paper • 2310.19909 • Published Oct 30, 2023 • 20
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans? Paper • 2311.00047 • Published Oct 31, 2023 • 8
TroL: Traversal of Layers for Large Language and Vision Models Paper • 2406.12246 • Published Jun 18 • 34
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24 • 57
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published Jun 24 • 25
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9 • 31