GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish Paper • 2402.09759 • Published Feb 15 • 1
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 71