Edit model card

This is a GenerativeImage2Text model finetuned on non-text images extracted from documents (i.e.PDF). It is used to analyze the content of the image and produce a descriptive caption. It is part of a project to build a software solution capable of processing offline documents (PDFs, Word, PowerPoint, PPT, etc.) to detect WCAG accessibility issues.

Example document with non-text images: Extracted Image: Generated caption: "Indication of correct signature"

Downloads last month: 10

Safetensors

Model size

177M params

Tensor type

F32

Inference Examples

Image-to-Text

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Caraaaaa
/

text_image_captioning

Dataset used to train Caraaaaa/text_image_captioning