Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published Oct 17 • 27
Generalizable Entity Grounding via Assistance of Large Language Model Paper • 2402.02555 • Published Feb 4
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries Paper • 2404.00086 • Published Mar 29
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow Paper • 2405.20282 • Published May 30
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning Paper • 2403.12003 • Published Mar 18 • 2
LLAVADI: What Matters For Multimodal Large Language Models Distillation Paper • 2407.19409 • Published Jul 28
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10 • 49
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10 • 49
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10 • 49 • 2
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27 • 51 • 10
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27 • 51 • 10