Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
{}
|
3 |
+
---
|
4 |
+
#### Model for the paper: [Harnessing Webpage Uis For Text Rich Visual Understanding]()
|
5 |
+
|
6 |
+
🌐 [Homepage](https://neulab.github.io/MultiUI/) | 🐍 [GitHub](https://github.com/neulab/multiui) | 📖 [arXiv]()
|
7 |
+
|
8 |
+
## Introduction
|
9 |
+
We introduce **MultiUI**, a dataset containing 7.3 million samples from 1 million websites, covering diverse multi- modal tasks and UI layouts. Models trained on **MultiUI** not only excel in web UI tasks—achieving up to a 48% improvement on VisualWebBench and a 19.1% boost in action accuracy on a web agent dataset Mind2Web—but also generalize surprisingly well to non-web UI tasks and even to non-UI domains, such as document understanding, OCR, and chart interpretation.
|
10 |
+
|
11 |
+
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/vk7yT4Y7ydBOHM6BojmlI.mp4"></video>
|
12 |
+
|
13 |
+
## Model Performance
|
14 |
+
|
15 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/h1L7J4rLlq6EOtbiXZjZW.png)
|
16 |
+
|
17 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/NOVQ8WjgJoRm0bzN9zxFx.png)
|
18 |
+
|
19 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/O6GhR1UXOSi7o3yjXvK4e.png)
|
20 |
+
|
21 |
+
## Contact
|
22 |
+
* Junpeng Liu: [email protected]
|
23 |
+
* Xiang Yue: [email protected]
|
24 |
+
|
25 |
+
## Citation
|
26 |
+
If you find this work helpful, please cite out paper:
|