stereoplegic
's Collections
Interpretability
updated
A technical note on bilinear layers for interpretability
Paper
•
2305.03452
•
Published
•
1
Interpreting Transformer's Attention Dynamic Memory and Visualizing the
Semantic Information Flow of GPT
Paper
•
2305.13417
•
Published
•
1
Explainable AI for Pre-Trained Code Models: What Do They Learn? When
They Do Not Work?
Paper
•
2211.12821
•
Published
•
1
The Linear Representation Hypothesis and the Geometry of Large Language
Models
Paper
•
2311.03658
•
Published
•
1
Interpreting Pretrained Language Models via Concept Bottlenecks
Paper
•
2311.05014
•
Published
•
1
White-Box Transformers via Sparse Rate Reduction
Paper
•
2306.01129
•
Published
•
1
ICICLE: Interpretable Class Incremental Continual Learning
Paper
•
2303.07811
•
Published
•
1
Differentiable Model Selection for Ensemble Learning
Paper
•
2211.00251
•
Published
•
1
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models
for Programming Language Attend Code Structure
Paper
•
2210.04633
•
Published
•
1
Forms of Understanding of XAI-Explanations
Paper
•
2311.08760
•
Published
•
1
Schema-learning and rebinding as mechanisms of in-context learning and
emergence
Paper
•
2307.01201
•
Published
•
2
Concept-Centric Transformers: Enhancing Model Interpretability through
Object-Centric Concept Learning within a Shared Global Workspace
Paper
•
2305.15775
•
Published
•
1
Causal Analysis for Robust Interpretability of Neural Networks
Paper
•
2305.08950
•
Published
•
1
Emergence of Segmentation with Minimalistic White-Box Transformers
Paper
•
2308.16271
•
Published
•
13
White-Box Transformers via Sparse Rate Reduction: Compression Is All
There Is?
Paper
•
2311.13110
•
Published
•
1
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
•
2312.06550
•
Published
•
56
Patchscope: A Unifying Framework for Inspecting Hidden Representations
of Language Models
Paper
•
2401.06102
•
Published
•
19
Attention Lens: A Tool for Mechanistically Interpreting the Attention
Head Information Retrieval Mechanism
Paper
•
2310.16270
•
Published
•
1