Missing example for running the model

#2
by molntamas - opened

This model needs a bounding box to specify which widget to describe.
But there is no example for this on the model card.
What is unclear how the bounding box should be specified.

As I understand the code should look something like this:

model = Pix2StructForConditionalGeneration.from_pretrained("google/pix2struct-widget-captioning-base")
processor = Pix2StructProcessor.from_pretrained("google/pix2struct-widget-captioning-base")

question = "? bounding box ?"

inputs = processor(images=image, text=question, return_tensors="pt")

predictions = model.generate(**inputs)
print(processor.decode(predictions[0], skip_special_tokens=True))

Same issue here.
The model seems to return same caption regardless of the bounding box.

Has anyone solved it yet?

Sign up or log in to comment