LongAlign: A Recipe for Long Context Alignment of Large Language Models Paper • 2401.18058 • Published Jan 31 • 21
Scavenging Hyena: Distilling Transformers into Long Convolution Models Paper • 2401.17574 • Published Jan 31 • 15
Rethinking Interpretability in the Era of Large Language Models Paper • 2402.01761 • Published Jan 30 • 21
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 69
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay Paper • 2402.04858 • Published Feb 7 • 14
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper • 2402.04347 • Published Feb 6 • 13
Premise Order Matters in Reasoning with Large Language Models Paper • 2402.08939 • Published Feb 14 • 26
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Paper • 2403.14624 • Published Mar 21 • 51