Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ On the other hand, there is a significant lack of research focused on extracting
|
|
22 |
|
23 |
LayoutLM [1] is an excellent solution for the problems because, at its core, it is a regular BERT-alike model, but it is uniquely capable of embedding positional information about the text alongside the text itself.
|
24 |
|
25 |
-
We have fine-tuned the model on the DocVQA [2] dataset, showing the potential improvement upon the current SOTA
|
26 |
|
27 |
| Model | HR@3 | HR@5 | HR@10 |
|
28 |
|---------------------------------|----------------|----------------|----------------|
|
@@ -32,14 +32,14 @@ We have fine-tuned the model on the DocVQA [2] dataset, showing the potential im
|
|
32 |
| LayoutLM-Byne (our model) | 0.3491 | **0.4269** | **0.5436** |
|
33 |
| Improvement over best competitor| -1.61% | +5.62% | +18.87% |
|
34 |
|
|
|
|
|
35 |
### Usage
|
36 |
-
Please refer to the Colab workbook or the blog post to learn more!
|
37 |
|
38 |
### Get in touch
|
39 |
Reach out to [[email protected]](mailto:[email protected]) if you'd like help with deploying the model in commerical setting.
|
40 |
|
41 |
[1] Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1192-1200).
|
42 |
|
43 |
-
[2] Mathew, M., Karatzas, D., & Jawahar, C. V. (2021). DocVQA: A Dataset for VQA on Document Images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2200-2209).
|
44 |
-
|
45 |
-
[3] Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3982-3992).
|
|
|
22 |
|
23 |
LayoutLM [1] is an excellent solution for the problems because, at its core, it is a regular BERT-alike model, but it is uniquely capable of embedding positional information about the text alongside the text itself.
|
24 |
|
25 |
+
We have fine-tuned the model on the DocVQA [2] dataset, showing the potential improvement upon the current SOTA:
|
26 |
|
27 |
| Model | HR@3 | HR@5 | HR@10 |
|
28 |
|---------------------------------|----------------|----------------|----------------|
|
|
|
32 |
| LayoutLM-Byne (our model) | 0.3491 | **0.4269** | **0.5436** |
|
33 |
| Improvement over best competitor| -1.61% | +5.62% | +18.87% |
|
34 |
|
35 |
+
It is important to highlight that the model is still in alpha, so further work is required to reveal it's potential.
|
36 |
+
|
37 |
### Usage
|
38 |
+
Please refer to the [Colab workbook](https://colab.research.google.com/drive/1YkPtCOrXdDMTv_gm14VoZeofJoNRotzO?usp=sharing) or the blog post to learn more!
|
39 |
|
40 |
### Get in touch
|
41 |
Reach out to [[email protected]](mailto:[email protected]) if you'd like help with deploying the model in commerical setting.
|
42 |
|
43 |
[1] Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1192-1200).
|
44 |
|
45 |
+
[2] Mathew, M., Karatzas, D., & Jawahar, C. V. (2021). DocVQA: A Dataset for VQA on Document Images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2200-2209).
|
|
|
|