Byne
/

LayoutLM-Byne-v0.1

@@ -22,7 +22,7 @@ On the other hand, there is a significant lack of research focused on extracting
 LayoutLM [1] is an excellent solution for the problems because, at its core, it is a regular BERT-alike model, but it is uniquely capable of embedding positional information about the text alongside the text itself.
-We have fine-tuned the model on the DocVQA [2] dataset, showing the potential improvement upon the current SOTA [4]:
 | Model                           | HR@3           | HR@5           | HR@10          |
 |---------------------------------|----------------|----------------|----------------|
@@ -32,14 +32,14 @@ We have fine-tuned the model on the DocVQA [2] dataset, showing the potential im
 | LayoutLM-Byne (our model)       | 0.3491         | **0.4269**     | **0.5436**     |
 | Improvement over best competitor| -1.61%         | +5.62%         | +18.87%        |
 ### Usage
-Please refer to the Colab workbook or the blog post to learn more!
 ### Get in touch
 Reach out to [[email protected]](mailto:[email protected]) if you'd like help with deploying the model in commerical setting.
 [1] Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1192-1200).
-[2] Mathew, M., Karatzas, D., & Jawahar, C. V. (2021). DocVQA: A Dataset for VQA on Document Images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2200-2209).
-[3] Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3982-3992).

 LayoutLM [1] is an excellent solution for the problems because, at its core, it is a regular BERT-alike model, but it is uniquely capable of embedding positional information about the text alongside the text itself.
+We have fine-tuned the model on the DocVQA [2] dataset, showing the potential improvement upon the current SOTA:
 | Model                           | HR@3           | HR@5           | HR@10          |
 |---------------------------------|----------------|----------------|----------------|
 | LayoutLM-Byne (our model)       | 0.3491         | **0.4269**     | **0.5436**     |
 | Improvement over best competitor| -1.61%         | +5.62%         | +18.87%        |
+It is important to highlight that the model is still in alpha, so further work is required to reveal it's potential.
 ### Usage
+Please refer to the [Colab workbook](https://colab.research.google.com/drive/1YkPtCOrXdDMTv_gm14VoZeofJoNRotzO?usp=sharing) or the blog post to learn more!
 ### Get in touch
 Reach out to [[email protected]](mailto:[email protected]) if you'd like help with deploying the model in commerical setting.
 [1] Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1192-1200).
+[2] Mathew, M., Karatzas, D., & Jawahar, C. V. (2021). DocVQA: A Dataset for VQA on Document Images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2200-2209).