root commited on
Commit
93567ee
1 Parent(s): c7b97db
README.md CHANGED
@@ -1,3 +1,59 @@
1
  ---
 
2
  license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
  license: mit
4
+ pipeline_tag: document-question-answering
5
+ tags:
6
+ - layoutlm
7
+ - document-question-answering
8
+ - pdf
9
+ widget:
10
+ - text: "What is the invoice number?"
11
+ src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png"
12
+ - text: "What is the purchase amount?"
13
+ src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/contract.jpeg"
14
  ---
15
+
16
+ # LayoutLM for Visual Question Answering
17
+
18
+ This is a fine-tuned version of the multi-modal [LayoutLM](https://aka.ms/layoutlm) model for the task of question answering on documents. It has been fine-tuned using both the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) and [DocVQA](https://www.docvqa.org/) datasets.
19
+
20
+ ## Getting started with the model
21
+
22
+ To run these examples, you must have [PIL](https://pillow.readthedocs.io/en/stable/installation.html), [pytesseract](https://pypi.org/project/pytesseract/), and [PyTorch](https://pytorch.org/get-started/locally/) installed in addition to [transformers](https://huggingface.co/docs/transformers/index).
23
+
24
+ ```python
25
+ from transformers import pipeline
26
+
27
+ nlp = pipeline(
28
+ "document-question-answering",
29
+ model="impira/layoutlm-document-qa",
30
+ )
31
+
32
+ nlp(
33
+ "https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
34
+ "What is the invoice number?"
35
+ )
36
+ # {'score': 0.9943977, 'answer': 'us-001', 'start': 15, 'end': 15}
37
+
38
+ nlp(
39
+ "https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg",
40
+ "What is the purchase amount?"
41
+ )
42
+ # {'score': 0.9912159, 'answer': '$1,000,000,000', 'start': 97, 'end': 97}
43
+
44
+ nlp(
45
+ "https://www.accountingcoach.com/wp-content/uploads/2013/10/[email protected]",
46
+ "What are the 2020 net sales?"
47
+ )
48
+ # {'score': 0.59147286, 'answer': '$ 3,750', 'start': 19, 'end': 20}
49
+ ```
50
+
51
+ **NOTE**: This model and pipeline was recently landed in transformers via [PR #18407](https://github.com/huggingface/transformers/pull/18407) and [PR #18414](https://github.com/huggingface/transformers/pull/18414), so you'll need to use a recent version of transformers, for example:
52
+
53
+ ```bash
54
+ pip install git+https://github.com/huggingface/transformers.git@2ef774211733f0acf8d3415f9284c49ef219e991
55
+ ```
56
+
57
+ ## About us
58
+
59
+ This model was created by the team at [Impira](https://www.impira.com/).
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "impira/layoutlm-document-qa",
3
+ "architectures": [
4
+ "LayoutLMForQuestionAnswering"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "gradient_checkpointing": false,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "layer_norm_eps": 1e-05,
17
+ "max_2d_position_embeddings": 1024,
18
+ "max_position_embeddings": 514,
19
+ "model_type": "layoutlm",
20
+ "num_attention_heads": 12,
21
+ "num_hidden_layers": 12,
22
+ "pad_token_id": 1,
23
+ "position_embedding_type": "absolute",
24
+ "tokenizer_class": "RobertaTokenizer",
25
+ "transformers_version": "4.22.0.dev0",
26
+ "type_vocab_size": 1,
27
+ "use_cache": true,
28
+ "vocab_size": 50265
29
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4bbad3e4a1b5ae50c787b7afd6049a0bfa99fd823b50436e444e092ae2347b9
3
+ size 511200628
pyproject.toml ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ [tool.black]
2
+ line-length = 119
3
+ target-version = ['py35']
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf8b7882db23c58763acd876f50d04e3f291c8845febd3556f26b91f4b73f7c9
3
+ size 511244837
setup.cfg ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [isort]
2
+ default_section = FIRSTPARTY
3
+ ensure_newline_before_comments = True
4
+ force_grid_wrap = 0
5
+ include_trailing_comma = True
6
+ known_first_party = transformers
7
+
8
+ line_length = 119
9
+ lines_after_imports = 2
10
+ multi_line_output = 3
11
+ use_parentheses = True
12
+
13
+ [flake8]
14
+ ignore = E203, E501, E741, W503, W605
15
+ max-line-length = 119
16
+
17
+ [tool:pytest]
18
+ doctest_optionflags=NUMBER NORMALIZE_WHITESPACE ELLIPSIS
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": false}}
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b79d6d938ef00f3ef9666db0d12907855272a1c476145d1bd8440cfdb97e433
3
+ size 511465184
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "roberta-base", "add_prefix_space": true}
vocab.json ADDED
The diff for this file is too large to render. See raw diff