-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 258 -
Magicoder: Source Code Is All You Need
Paper • 2312.02120 • Published • 79 -
Mixtral of Experts
Paper • 2401.04088 • Published • 159 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 100
Collections
Discover the best community collections!
Collections including paper arxiv:2401.04088
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 36 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 6 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 14 -
Attention Is All You Need
Paper • 1706.03762 • Published • 44
-
Mixtral of Experts
Paper • 2401.04088 • Published • 159 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 71 -
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 89 -
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper • 2401.02415 • Published • 53