Rethinking Interpretability in the Era of Large Language Models Paper • 2402.01761 • Published Jan 30 • 21
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs Paper • 2311.02262 • Published Nov 3, 2023 • 10
Tree Prompting: Efficient Task Adaptation without Fine-Tuning Paper • 2310.14034 • Published Oct 21, 2023 • 2
Explaining black box text modules in natural language with language models Paper • 2305.09863 • Published May 17, 2023 • 3