btjhjeon (Jaehyun Jun)

upvoted a paper 3 days ago

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published 5 days ago • 82

upvoted 2 papers 4 days ago

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published 6 days ago • 17

TokenPacker: Efficient Visual Projector for Multimodal LLM

Paper • 2407.02392 • Published 6 days ago • 18

upvoted a paper 14 days ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published 18 days ago • 76

upvoted 5 papers 20 days ago

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published 22 days ago • 11

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Paper • 2406.11271 • Published 21 days ago • 10

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Paper • 2406.11839 • Published 21 days ago • 36

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Paper • 2406.11833 • Published 21 days ago • 61

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Paper • 2406.05967 • Published 28 days ago • 5

upvoted 4 papers 21 days ago

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

Paper • 2406.09403 • Published 25 days ago • 18

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Paper • 2406.08418 • Published 26 days ago • 28

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Paper • 2406.09961 • Published 24 days ago • 54

Needle In A Multimodal Haystack

Paper • 2406.07230 • Published 27 days ago • 52

upvoted 2 papers 24 days ago

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Paper • 2406.09411 • Published 25 days ago • 17

Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

Paper • 2406.08487 • Published 26 days ago • 10

upvoted 4 papers about 1 month ago

upvoted a paper about 2 months ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 113

upvoted a collection about 2 months ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated 11 days ago • 119

upvoted an article about 2 months ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 163

upvoted a paper about 2 months ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 91

upvoted a collection 2 months ago

VILA: On Pre-training for Visual Language Models

Collection

10 items • Updated May 6 • 28

upvoted 2 papers 2 months ago

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 70

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25 • 50

upvoted a paper 3 months ago

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29 • 22

upvoted a paper 4 months ago

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 51

Jaehyun Jun

AI & ML interests

Organizations

btjhjeon's activity