--- license: odc-by datasets: - neulab/MultiUI language: - en base_model: - Qwen/Qwen2-7B-Instruct tags: - GUI - Agent - Web - OCR - Doc - VQA --- #### Model for the paper: [Harnessing Webpage Uis For Text Rich Visual Understanding](https://arxiv.org/abs/2410.13824) 🌐 [Homepage](https://neulab.github.io/MultiUI/) | 🐍 [GitHub](https://github.com/neulab/multiui) | 📖 [arXiv](https://arxiv.org/abs/2410.13824) ## Introduction We introduce **MultiUI**, a dataset containing 7.3 million samples from 1 million websites, covering diverse multi- modal tasks and UI layouts. Models trained on **MultiUI** not only excel in web UI tasks—achieving up to a 48% improvement on VisualWebBench and a 19.1% boost in action accuracy on a web agent dataset Mind2Web—but also generalize surprisingly well to non-web UI tasks and even to non-UI domains, such as document understanding, OCR, and chart interpretation. ## Training & Evaluation The model training is based on the **[LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT)**. For deployment, refer to **SGLang deployment** section in LLaVA-NeXT repo. For benchmark evaluation, the awesome **lmms-eval** package is used. Check our repo **[MultiUI](https://github.com/neulab/multiui)** to evaluate on benchmarks mentioned in the paper. ## Model Performance ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/h1L7J4rLlq6EOtbiXZjZW.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/NOVQ8WjgJoRm0bzN9zxFx.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/O6GhR1UXOSi7o3yjXvK4e.png) ## Contact * Junpeng Liu: jpliu@link.cuhk.edu.hk * Xiang Yue: xyue2@andrew.cmu.edu ## Citation If you find this work helpful, please cite out paper: ```` @misc{liu2024harnessingwebpageuistextrich, title={Harnessing Webpage UIs for Text-Rich Visual Understanding}, author={Junpeng Liu and Tianyue Ou and Yifan Song and Yuxiao Qu and Wai Lam and Chenyan Xiong and Wenhu Chen and Graham Neubig and Xiang Yue}, year={2024}, eprint={2410.13824}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.13824}, } ````