Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models Paper • 2409.18943 • Published 8 days ago • 26
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published 4 days ago • 18
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published 10 days ago • 22
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65
Scalify: scale propagation for efficient low-precision LLM training Paper • 2407.17353 • Published Jul 24 • 11
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Paper • 2407.11062 • Published Jul 10 • 8
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published Jul 17 • 76
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA Paper • 2406.17419 • Published Jun 25 • 15
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks Paper • 2406.12066 • Published Jun 17 • 8
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 55
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published May 23 • 35
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published Apr 18 • 24
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published Apr 18 • 38
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 592
SDXL-Lightning: Progressive Adversarial Diffusion Distillation Paper • 2402.13929 • Published Feb 21 • 27
Instruction-tuned Language Models are Better Knowledge Learners Paper • 2402.12847 • Published Feb 20 • 24
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31 • 59
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper • 2401.15024 • Published Jan 26 • 67
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference Paper • 2401.08671 • Published Jan 9 • 13
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 70
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 40
Generative Multimodal Models are In-Context Learners Paper • 2312.13286 • Published Dec 20, 2023 • 34
LLM-FP4: 4-Bit Floating-Point Quantized Transformers Paper • 2310.16836 • Published Oct 25, 2023 • 13
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots Paper • 2310.13724 • Published Oct 19, 2023 • 8
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 96
Enable Language Models to Implicitly Learn Self-Improvement From Data Paper • 2310.00898 • Published Oct 2, 2023 • 23
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Paper • 2309.16058 • Published Sep 27, 2023 • 55
Large Language Models Cannot Self-Correct Reasoning Yet Paper • 2310.01798 • Published Oct 3, 2023 • 32
DreamLLM: Synergistic Multimodal Comprehension and Creation Paper • 2309.11499 • Published Sep 20, 2023 • 58
TextBind: Multi-turn Interleaved Multimodal Instruction-following Paper • 2309.08637 • Published Sep 14, 2023 • 7
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) Paper • 2309.08968 • Published Sep 16, 2023 • 22
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages Paper • 2309.09400 • Published Sep 17, 2023 • 82
Efficient Memory Management for Large Language Model Serving with PagedAttention Paper • 2309.06180 • Published Sep 12, 2023 • 25
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 86