RuntimeError: Expected attn_mask dtype to be bool or float or to match query dtype, but got attn_mask.dtype: c10::Half and query.dtype: float instead.
when i run code " candi_emb_3 = model.encode(text="The Mid-Hudson Bridge was designated as a New York State Historic Civil Engineering Landmark by the American Society of Civil Engineers in 1983. The bridge was renamed the "Franklin Delano Roosevelt Mid-Hudson Bridge" in 1994.")" .
I got the error "RuntimeError: Expected attn_mask dtype to be bool or float or to match query dtype, but got attn_mask.dtype: c10::Half and query.dtype: float instead."
But when I run the code with image,there is no problem. that's why?
Hello.
It seems that this issue was fixed before. Are you using the latest code? https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/visual
If the problem still exists, could you provide detailed error information, including the specific line of code where the error occurs?
Thank you.
Hi @JUNJIE99 I'm also experiencing same error can you please check.
I've also opened Issue on github https://github.com/FlagOpen/FlagEmbedding/issues/1121
@BiXie have you solved it? if you have can you please share steps you took to solve ?
I've resolved it by setting dtype to torch.float32 instead of torch.float16 at FlagEmbedding/FlagEmbedding/visual/modeling.py
modified function
def get_extended_attention_mask(
self, attention_mask: Tensor, input_shape: Tuple[int], device: torch.device = None, dtype: torch.float = torch.float32
) -> Tensor:
"""
Makes broadcastable attention and causal masks so that future and masked tokens are ignored.
Arguments:
attention_mask (`torch.Tensor`):
Mask with ones indicating tokens to attend to, zeros for tokens to ignore.
input_shape (`Tuple[int]`):
The shape of the input to the model.
Returns:
`torch.Tensor` The extended attention mask, with a the same dtype as `attention_mask.dtype`.
"""
# We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
# ourselves in which case we just need to make it broadcastable to all heads.
if attention_mask.dim() == 3:
extended_attention_mask = attention_mask[:, None, :, :]
elif attention_mask.dim() == 2:
# Provided a padding mask of dimensions [batch_size, seq_length]
# - if the model is a decoder, apply a causal mask in addition to the padding mask
# - if the model is an encoder, make the mask broadcastable to [batch_size, num_heads, seq_length, seq_length]
extended_attention_mask = attention_mask[:, None, None, :]
else:
raise ValueError(
f"Wrong shape for input_ids (shape {input_shape}) or attention_mask (shape {attention_mask.shape})"
)
# Since attention_mask is 1.0 for positions we want to attend and 0.0 for
# masked positions, this operation will create a tensor which is 0.0 for
# positions we want to attend and the dtype's smallest value for masked positions.
# Since we are adding it to the raw scores before the softmax, this is
# effectively the same as removing these entirely.
extended_attention_mask = extended_attention_mask.to(dtype=dtype) # fp16 compatibility
extended_attention_mask = (1.0 - extended_attention_mask) * torch.finfo(dtype).min
return extended_attention_mask
Hello. I have rechecked the code, and the issue is caused by the data type used during inference. Your solution is a temporary workaround when using FP32 for inference. However, it may not work if FP16 is used for inference.
I will update the code soon to ensure that inference with any data type does not result in errors.
Sorry for the inconvenience.
I have updated the code, it should be fine now.