Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,64 @@
|
|
1 |
-
---
|
2 |
-
license: bsd-3-clause-clear
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: bsd-3-clause-clear
|
3 |
+
---
|
4 |
+
|
5 |
+
# WAFFLE: Multi-Modal Model for Automated Front-End Development
|
6 |
+
We develope WAFFLE, a fine-tuning approach to train multi-modal LLM (MLLM) to generate HTML code from webpage screenshots or UI designs. WAFFLE uses a structure-aware attention mechanism to improve MLLMs' understanding of HTML's structure and a contrastive fine-tuning approach to align MLLMs' understanding of UI images and HTML code. Models fine-tuned with WAFFLE show up to 9.00 pp (percentage point) higher HTML match, 0.0982 higher CW-SSIM, 32.99 higher CLIP, and 27.12 pp higher LLEM on our new benchmark WebSight-Test and an existing benchmark Design2Code.
|
7 |
+
|
8 |
+
## Updates:
|
9 |
+
* 10/24/2024: Our preprint avaiable at: [preprint](https://arxiv.org/abs/2410.18362)
|
10 |
+
* 10/24/2024: Our code (keep maintaining) avaiable at: [code](https://github.com/lt-asset/Waffle)
|
11 |
+
* 10/24/2024: Our fine-tuned Waffle_VLM_WebSight (7B), using DoRA, is released at: [lt-asset/Waffle_VLM_WebSight](https://huggingface.co/lt-asset/Waffle_VLM_WebSight)
|
12 |
+
|
13 |
+
## Dependency
|
14 |
+
- peft 0.11.1
|
15 |
+
- transformers 4.41.1
|
16 |
+
- pytorch 2.3.0
|
17 |
+
- selenium
|
18 |
+
- Python 3.10.14
|
19 |
+
- deepspeed 0.14.1
|
20 |
+
- datasets 2.19.1
|
21 |
+
- beautifulsoup4 4.12.3
|
22 |
+
- accelerate 0.30.1
|
23 |
+
|
24 |
+
## Structure
|
25 |
+
- `vlm_websight` contains the dataset class file, model class files, and training file for vlm_websight.
|
26 |
+
- `eval_websight.py` is the inference file
|
27 |
+
- `dataset.py` is the dataset class file
|
28 |
+
- WebSight-Test is one of our test dataset
|
29 |
+
|
30 |
+
## Quick Start
|
31 |
+
```bash
|
32 |
+
cd vlm_websight
|
33 |
+
# generate HTML/CSS code for UI image --image_path, save the code to --html_path
|
34 |
+
python quick_start.py --image_path ../WebSight-Test/test-495.png --html_path examples/example-495.html
|
35 |
+
# render the HTML/CSS code in --html_path, and save the rendered image to --image_path
|
36 |
+
python render_html.py --html_path examples/example-495.html --image_path examples/example-495.png
|
37 |
+
```
|
38 |
+
|
39 |
+
## Example
|
40 |
+
* Input UI design
|
41 |
+
|
42 |
+
![test-495.png](WebSight-Test/test-495.png)
|
43 |
+
|
44 |
+
* Waffle-VLM-WebSight generated HTML code
|
45 |
+
|
46 |
+
[example-495.html](vlm_websight/examples/example-495.html)
|
47 |
+
|
48 |
+
* Rendered Waffle-VLM-WebSight output
|
49 |
+
|
50 |
+
![example-495.html](vlm_websight/examples/example-495.png)
|
51 |
+
|
52 |
+
|
53 |
+
## Citation
|
54 |
+
```
|
55 |
+
@misc{liang2024wafflemultimodalmodelautomated,
|
56 |
+
title={WAFFLE: Multi-Modal Model for Automated Front-End Development},
|
57 |
+
author={Shanchao Liang and Nan Jiang and Shangshu Qian and Lin Tan},
|
58 |
+
year={2024},
|
59 |
+
eprint={2410.18362},
|
60 |
+
archivePrefix={arXiv},
|
61 |
+
primaryClass={cs.SE},
|
62 |
+
url={https://arxiv.org/abs/2410.18362},
|
63 |
+
}
|
64 |
+
```
|