Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27 • 51
How Do Large Language Models Acquire Factual Knowledge During Pretraining? Paper • 2406.11813 • Published Jun 17 • 30
Efficient Continual Pre-training by Mitigating the Stability Gap Paper • 2406.14833 • Published Jun 21 • 19
Unlocking Continual Learning Abilities in Language Models Paper • 2406.17245 • Published Jun 25 • 28