Transformers
PyTorch
layoutlmv2
Inference Endpoints
layoutxlm-base / README.md
nielsr's picture
nielsr HF staff
Update license
ab39c2c
|
raw
history blame
841 Bytes
metadata
license: cc-by-nc-sa-4.0

LayoutXLM

Multimodal (text + layout/format + image) pre-training for document AI

Microsoft Document AI | GitHub

Introduction

LayoutXLM is a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding. Experiment results show that it has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUN dataset.

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei, arXiv Preprint 2021