IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs Paper • 2405.02842 • Published May 5 • 1
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31 • 22
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27 • 51
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Paper • 2406.20076 • Published Jun 28 • 8
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published Jun 27 • 60
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models Paper • 2312.06109 • Published Dec 11, 2023 • 20