metadata
language: en
thumbnail: >-
https://uploads-ssl.webflow.com/5e3898dff507782a6580d710/614a23fcd8d4f7434c765ab9_logo.png
license: mit
LayoutLM for Visual Question Answering
This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on documents. It has been fine-tuned on
Model details
The LayoutLM model was developed at Microsoft (paper) as a general purpose tool for understanding documents. This model is a fine-tuned checkpoint of LayoutLM-Base-Cased, using both the SQuAD2.0 and DocVQA datasets.
Getting started with the model
To run these examples, you must have PIL, pytesseract, and PyTorch installed in addition to transformers.
from transformers import AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained(
"impira/layoutlm-document-qa",
add_prefix_space=True,
trust_remote_code=True,
)
nlp = pipeline(
model="impira/layoutlm-document-qa",
tokenizer=tokenizer,
trust_remote_code=True,
)
nlp(
"https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
"What is the invoice number?"
)
# {'score': 0.9943977, 'answer': 'us-001', 'start': 15, 'end': 15}
nlp(
"https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg",
"What is the purchase amount?"
)
# {'score': 0.9912159, 'answer': '$1,000,000,000', 'start': 97, 'end': 97}
nlp(
"https://www.accountingcoach.com/wp-content/uploads/2013/10/[email protected]",
"What are the 2020 net sales?"
)
# {'score': 0.59147286, 'answer': '$ 3,750', 'start': 19, 'end': 20}
NOTE: This model relies on a model definition and pipeline that are currently in review to be included in the transformers project. In the meantime, you'll have to use the trust_remote_code=True
flag to run this model.
About us
This model was created by the team at Impira.