Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models Paper • 2311.08692 • Published Nov 15, 2023 • 12
DiLoCo: Distributed Low-Communication Training of Language Models Paper • 2311.08105 • Published Nov 14, 2023 • 14
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 39
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning Paper • 2312.06134 • Published Dec 11, 2023 • 2