Extracting attention maps

#49

by roeehendel - opened Jun 8, 2023

Discussion

roeehendel

Jun 8, 2023

•

edited Jun 12, 2023

It seems that since the model is using scaled_dot_product_attention, passing output_attentions=True to the forward is not supported.
Also, attention masking using attention_mask is not supported (this fails silently, there is no assertion to warn the user).
Is there a workaround to enable using these features?
Perhaps there should be an option to use a regular implementation of attention instead of scaled_dot_product_attention.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment