28 63 8

Byung-Kwan Lee

BK-Lee

https://sites.google.com/view/byungkwanlee

AI & ML interests

Computer Vision, Machine Learning, Large Language and Vision Models, Efficient Modeling

Recent Activity

New activity 2 days ago

BK-Lee/Phantom-7B:Open-sourcing fine-tuning code

New activity 3 days ago

BK-Lee/Meteor-MLM:Add pipeline tag

upvoted a paper 6 days ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

View all activity

Organizations

BK-Lee's activity

upvoted a paper 6 days ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published 9 days ago • 95

upvoted 4 papers about 1 month ago

upvoted 5 papers about 2 months ago

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Paper • 2409.17066 • Published Sep 25 • 27

MIO: A Foundation Model on Multimodal Tokens

Paper • 2409.17692 • Published Sep 26 • 50

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27 • 91

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published Sep 26 • 46

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25 • 103

upvoted 5 papers 2 months ago

MaskBit: Embedding-free Image Generation via Bit Tokens

Paper • 2409.16211 • Published Sep 24 • 16

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published Sep 23 • 27

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18 • 75

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17 • 72

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4 • 72

upvoted 5 papers 3 months ago

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29 • 52

SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models

Paper • 2408.12114 • Published Aug 22 • 12

ShortCircuit: AlphaZero-Driven Circuit Design

Paper • 2408.09858 • Published Aug 19 • 16

Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17 • 21

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51