BAAI
/

RuntimeError: Expected attn_mask dtype to be bool or float or to match query dtype, but got attn_mask.dtype: c10::Half and query.dtype: float instead.

#3
by BiXie - opened

when i run code " candi_emb_3 = model.encode(text="The Mid-Hudson Bridge was designated as a New York State Historic Civil Engineering Landmark by the American Society of Civil Engineers in 1983. The bridge was renamed the "Franklin Delano Roosevelt Mid-Hudson Bridge" in 1994.")" .
I got the error "RuntimeError: Expected attn_mask dtype to be bool or float or to match query dtype, but got attn_mask.dtype: c10::Half and query.dtype: float instead."
But when I run the code with image,there is no problem. that's why?

Hello.
It seems that this issue was fixed before. Are you using the latest code? https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/visual
If the problem still exists, could you provide detailed error information, including the specific line of code where the error occurs?
Thank you.

Hi @JUNJIE99 I'm also experiencing same error can you please check.

I've also opened Issue on github https://github.com/FlagOpen/FlagEmbedding/issues/1121

@BiXie have you solved it? if you have can you please share steps you took to solve ?

@BiXie @JUNJIE99

I've resolved it by setting dtype to torch.float32 instead of torch.float16 at FlagEmbedding/FlagEmbedding/visual/modeling.py

modified function

    
    def get_extended_attention_mask(
        self, attention_mask: Tensor, input_shape: Tuple[int], device: torch.device = None, dtype: torch.float = torch.float32
    ) -> Tensor:
        """
        Makes broadcastable attention and causal masks so that future and masked tokens are ignored.

        Arguments:
            attention_mask (`torch.Tensor`):
                Mask with ones indicating tokens to attend to, zeros for tokens to ignore.
            input_shape (`Tuple[int]`):
                The shape of the input to the model.

        Returns:
            `torch.Tensor` The extended attention mask, with a the same dtype as `attention_mask.dtype`.
        """
        
        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
        # ourselves in which case we just need to make it broadcastable to all heads.
        if attention_mask.dim() == 3:
            extended_attention_mask = attention_mask[:, None, :, :]
        elif attention_mask.dim() == 2:
            # Provided a padding mask of dimensions [batch_size, seq_length]
            # - if the model is a decoder, apply a causal mask in addition to the padding mask
            # - if the model is an encoder, make the mask broadcastable to [batch_size, num_heads, seq_length, seq_length]
            
            extended_attention_mask = attention_mask[:, None, None, :]
        else:
            raise ValueError(
                f"Wrong shape for input_ids (shape {input_shape}) or attention_mask (shape {attention_mask.shape})"
            )

        # Since attention_mask is 1.0 for positions we want to attend and 0.0 for
        # masked positions, this operation will create a tensor which is 0.0 for
        # positions we want to attend and the dtype's smallest value for masked positions.
        # Since we are adding it to the raw scores before the softmax, this is
        # effectively the same as removing these entirely.
        extended_attention_mask = extended_attention_mask.to(dtype=dtype)  # fp16 compatibility
        extended_attention_mask = (1.0 - extended_attention_mask) * torch.finfo(dtype).min
        
        return extended_attention_mask

Hello. I have rechecked the code, and the issue is caused by the data type used during inference. Your solution is a temporary workaround when using FP32 for inference. However, it may not work if FP16 is used for inference.

I will update the code soon to ensure that inference with any data type does not result in errors.

Sorry for the inconvenience.

I have updated the code, it should be fine now.

Sign up or log in to comment