System 2 Attention (is something you might need too)
Abstract
Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information, QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Attention Sorting Combats Recency Bias In Long Context Language Models (2023)
- Auto-ICL: In-Context Learning without Human Supervision (2023)
- Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention (2023)
- Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism (2023)
- Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Revolutionizing LLMs: How System 2 Attention Enhances Accuracy and Objectivity!
Links π:
π Subscribe: https://www.youtube.com/@Arxflix
π Twitter: https://x.com/arxflix
π LMNT (Partner): https://lmnt.com/
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper