gitlost-murali
/

pix2struct-refexp-large

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

gitlost-murali commited on Jul 1, 2023

Commit

0b170b2

•

1 Parent(s): c2e43c2

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -27,6 +27,12 @@ tags:
 # TL;DR
 Pix2Struct is an image encoder - text decoder model that is trained on image-text pairs for various tasks, including image captionning and visual question answering. The full list of available models can be found on the Table 1 of the paper:
 ![Table 1 - paper](https://s3.amazonaws.com/moonup/production/uploads/1678712985040-62441d1d9fdefb55a0b7d12c.png)

 # TL;DR
+## Details for Pix2Struct-RefExp: (Based on their [pre-processing](https://github.com/google-research/pix2struct/blob/main/pix2struct/preprocessing/convert_refexp.py))
+-> __Input__: An image with a bounding box drawn on it around a candidate object and a header containing the referring expression (stored in the image feature).
+-> __Output__: A boolean flag (parse feature) indicating whether the candidate object is the correct referent of the referring expression.
 Pix2Struct is an image encoder - text decoder model that is trained on image-text pairs for various tasks, including image captionning and visual question answering. The full list of available models can be found on the Table 1 of the paper:
 ![Table 1 - paper](https://s3.amazonaws.com/moonup/production/uploads/1678712985040-62441d1d9fdefb55a0b7d12c.png)