Papers
arxiv:2312.05491

Using Captum to Explain Generative Language Models

Published on Dec 9, 2023
· Submitted by akhaliq on Dec 12, 2023
Authors:

Abstract

Captum is a comprehensive library for model explainability in PyTorch, offering a range of methods from the interpretability literature to enhance users' understanding of PyTorch models. In this paper, we introduce new features in Captum that are specifically designed to analyze the behavior of generative language models. We provide an overview of the available functionalities and example applications of their potential for understanding learned associations within generative language models.

Community

Great work! I previously used Captum to compute Shapley Values (SV) to explain natural language models in classification tasks and it is great to see new features for explaining generation. In our work (forgive me for shameless self-plug: https://arxiv.org/pdf/2305.19998.pdf), we find that 1) random seed choices can influence the explanation results a bit, and 2) computing SV for a large language model is costly if you want a stable explanation with large sample size. Casting generation as a consecutive classification task, I think that is still the case. We developed an amortized model to achieve a better stability-efficiency trade-off even allowing online computation of SV for LLMs. Would love to chat about incorporating my work into Captum or extending my work to generation setting :-)

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2312.05491 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2312.05491 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2312.05491 in a Space README.md to link it from this page.

Collections including this paper 3