Extracting attention maps
#49
by
roeehendel
- opened
It seems that since the model is using scaled_dot_product_attention
, passing output_attentions=True
to the forward is not supported.
Also, attention masking using attention_mask
is not supported (this fails silently, there is no assertion to warn the user).
Is there a workaround to enable using these features?
Perhaps there should be an option to use a regular implementation of attention instead of scaled_dot_product_attention
.