Safetensors
vmistral
custom_code
jiang719 commited on
Commit
e745606
1 Parent(s): 2fed580

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -3
README.md CHANGED
@@ -1,3 +1,64 @@
1
- ---
2
- license: bsd-3-clause-clear
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bsd-3-clause-clear
3
+ ---
4
+
5
+ # WAFFLE: Multi-Modal Model for Automated Front-End Development
6
+ We develope WAFFLE, a fine-tuning approach to train multi-modal LLM (MLLM) to generate HTML code from webpage screenshots or UI designs. WAFFLE uses a structure-aware attention mechanism to improve MLLMs' understanding of HTML's structure and a contrastive fine-tuning approach to align MLLMs' understanding of UI images and HTML code. Models fine-tuned with WAFFLE show up to 9.00 pp (percentage point) higher HTML match, 0.0982 higher CW-SSIM, 32.99 higher CLIP, and 27.12 pp higher LLEM on our new benchmark WebSight-Test and an existing benchmark Design2Code.
7
+
8
+ ## Updates:
9
+ * 10/24/2024: Our preprint avaiable at: [preprint](https://arxiv.org/abs/2410.18362)
10
+ * 10/24/2024: Our code (keep maintaining) avaiable at: [code](https://github.com/lt-asset/Waffle)
11
+ * 10/24/2024: Our fine-tuned Waffle_VLM_WebSight (7B), using DoRA, is released at: [lt-asset/Waffle_VLM_WebSight](https://huggingface.co/lt-asset/Waffle_VLM_WebSight)
12
+
13
+ ## Dependency
14
+ - peft 0.11.1
15
+ - transformers 4.41.1
16
+ - pytorch 2.3.0
17
+ - selenium
18
+ - Python 3.10.14
19
+ - deepspeed 0.14.1
20
+ - datasets 2.19.1
21
+ - beautifulsoup4 4.12.3
22
+ - accelerate 0.30.1
23
+
24
+ ## Structure
25
+ - `vlm_websight` contains the dataset class file, model class files, and training file for vlm_websight.
26
+ - `eval_websight.py` is the inference file
27
+ - `dataset.py` is the dataset class file
28
+ - WebSight-Test is one of our test dataset
29
+
30
+ ## Quick Start
31
+ ```bash
32
+ cd vlm_websight
33
+ # generate HTML/CSS code for UI image --image_path, save the code to --html_path
34
+ python quick_start.py --image_path ../WebSight-Test/test-495.png --html_path examples/example-495.html
35
+ # render the HTML/CSS code in --html_path, and save the rendered image to --image_path
36
+ python render_html.py --html_path examples/example-495.html --image_path examples/example-495.png
37
+ ```
38
+
39
+ ## Example
40
+ * Input UI design
41
+
42
+ ![test-495.png](WebSight-Test/test-495.png)
43
+
44
+ * Waffle-VLM-WebSight generated HTML code
45
+
46
+ [example-495.html](vlm_websight/examples/example-495.html)
47
+
48
+ * Rendered Waffle-VLM-WebSight output
49
+
50
+ ![example-495.html](vlm_websight/examples/example-495.png)
51
+
52
+
53
+ ## Citation
54
+ ```
55
+ @misc{liang2024wafflemultimodalmodelautomated,
56
+ title={WAFFLE: Multi-Modal Model for Automated Front-End Development},
57
+ author={Shanchao Liang and Nan Jiang and Shangshu Qian and Lin Tan},
58
+ year={2024},
59
+ eprint={2410.18362},
60
+ archivePrefix={arXiv},
61
+ primaryClass={cs.SE},
62
+ url={https://arxiv.org/abs/2410.18362},
63
+ }
64
+ ```