vikhyatk/moondream2 · The attention mask is not set and cannot be inferred from input because pad token is same as eos token

Aug 5

I am testing this model for image description. The description is generated successfully, but with an error message.

The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
The seen_tokens attribute is deprecated and will be removed in v4.41. Use the cache_position model input instead.

I am using this code:
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image

model_id = "vikhyatk/moondream2"
revision = "2024-07-23"
model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, revision=revision
)
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)

image = Image.open('test.jpg')
enc_image = model.encode_image(image)
print(model.answer_question(enc_image, "Describe this image.", tokenizer))

How did you fix the error?

vikhyatk

Owner Aug 6

I think it's just a warning, doesn't seem to affect accuracy. Haven't had a chance to look into what's causing it.

tokorinaga

Aug 19

I have the same messages.
Even when I set flash_attention I have the message "The attention mask is not set ..."

self.model_moondream = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.float16,
        attn_implementation="flash_attention_2",
        trust_remote_code=True,
        cache_dir=CACHE_DIR
        )

vikhyatk

Owner Aug 27

warning should stop popping up in the latest version, please reopen if you're still seeing it after updating

vikhyatk changed discussion status to closed Aug 27