|
--- |
|
license: odc-by |
|
datasets: |
|
- neulab/MultiUI |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/Qwen2-7B-Instruct |
|
tags: |
|
- GUI |
|
- Agent |
|
- Web |
|
- OCR |
|
- Doc |
|
- VQA |
|
--- |
|
#### Model for the paper: [Harnessing Webpage Uis For Text Rich Visual Understanding](https://arxiv.org/abs/2410.13824) |
|
|
|
🌐 [Homepage](https://neulab.github.io/MultiUI/) | 🐍 [GitHub](https://github.com/neulab/multiui) | 📖 [arXiv](https://arxiv.org/abs/2410.13824) |
|
|
|
## Introduction |
|
We introduce **MultiUI**, a dataset containing 7.3 million samples from 1 million websites, covering diverse multi- modal tasks and UI layouts. Models trained on **MultiUI** not only excel in web UI tasks—achieving up to a 48% improvement on VisualWebBench and a 19.1% boost in action accuracy on a web agent dataset Mind2Web—but also generalize surprisingly well to non-web UI tasks and even to non-UI domains, such as document understanding, OCR, and chart interpretation. |
|
|
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/vk7yT4Y7ydBOHM6BojmlI.mp4"></video> |
|
|
|
## Model Performance |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/h1L7J4rLlq6EOtbiXZjZW.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/NOVQ8WjgJoRm0bzN9zxFx.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/O6GhR1UXOSi7o3yjXvK4e.png) |
|
|
|
## Contact |
|
* Junpeng Liu: [email protected] |
|
* Xiang Yue: [email protected] |
|
|
|
## Citation |
|
If you find this work helpful, please cite out paper: |
|
```` |
|
@misc{liu2024harnessingwebpageuistextrich, |
|
title={Harnessing Webpage UIs for Text-Rich Visual Understanding}, |
|
author={Junpeng Liu and Tianyue Ou and Yifan Song and Yuxiao Qu and Wai Lam and Chenyan Xiong and Wenhu Chen and Graham Neubig and Xiang Yue}, |
|
year={2024}, |
|
eprint={2410.13824}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
url={https://arxiv.org/abs/2410.13824}, |
|
} |
|
```` |