An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published Jun 13 • 50
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published May 15 • 24
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 Paper • 2405.00664 • Published May 1 • 18
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1 • 30
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 73
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 68
SnapKV: LLM Knows What You are Looking for Before Generation Paper • 2404.14469 • Published Apr 22 • 23
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 53
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published Apr 18 • 38
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published Apr 18 • 24
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 30
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published Apr 9 • 33
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 103
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published Apr 9 • 64
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance Paper • 2404.04125 • Published Apr 4 • 27