When can transformers reason with abstract symbols?
Abstract
We investigate the capabilities of transformer large language models (LLMs) on relational reasoning tasks involving abstract symbols. Such tasks have long been studied in the neuroscience literature as fundamental building blocks for more complex abilities in programming, mathematics, and verbal reasoning. For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an "inverse scaling law": transformers fail to generalize as their embedding dimension increases. For both settings (i) and (ii), we propose subtle transformer modifications which can reduce the amount of data needed by adding two trainable parameters per head.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations (2023)
- Improving Length-Generalization in Transformers via Task Hinting (2023)
- A Meta-Learning Perspective on Transformers for Causal Language Modeling (2023)
- Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions (2023)
- Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper