DocGraphLM: Documental Graph Language Model for Information Extraction
Abstract
Advances in Visually Rich Document Understanding (VrDU) have enabled information extraction and question answering over documents with complex layouts. Two tropes of architectures have emerged -- transformer-based models inspired by LLMs, and Graph Neural Networks. In this paper, we introduce DocGraphLM, a novel framework that combines pre-trained language models with graph semantics. To achieve this, we propose 1) a joint encoder architecture to represent documents, and 2) a novel link prediction approach to reconstruct document graphs. DocGraphLM predicts both directions and distances between nodes using a convergent joint loss function that prioritizes neighborhood restoration and downweighs distant node detection. Our experiments on three SotA datasets show consistent improvement on IE and QA tasks with the adoption of graph features. Moreover, we report that adopting the graph features accelerates convergence in the learning process during training, despite being solely constructed through link prediction.
Community
Is there a repo for this code?
Thank you for sharing, great idea. Reminds me of PICK but quite different.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DocLLM: A layout-aware generative language model for multimodal document understanding (2023)
- ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science (2023)
- An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction (2024)
- HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding (2023)
- Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper