LayoutLM
Collection
The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA.
•
5 items
•
Updated
•
13
Multimodal (text + layout/format + image) pre-training for document AI
LayoutXLM is a multilingual variant of LayoutLMv2.
The documentation of this model in the Transformers library can be found here.
Microsoft Document AI | GitHub
LayoutXLM is a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding. Experiment results show that it has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset.
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei, arXiv Preprint 2021