lt-asset
/

Waffle_VLM_WebSight

Model card Files Files and versions Community

jiang719 commited on 8 days ago

Commit

e745606

•

1 Parent(s): 2fed580

Update README.md

Files changed (1) hide show

README.md +64 -3

README.md CHANGED Viewed

@@ -1,3 +1,64 @@
----
-license: bsd-3-clause-clear
----

+---
+license: bsd-3-clause-clear
+---
+# WAFFLE: Multi-Modal Model for Automated Front-End Development
+We develope WAFFLE, a fine-tuning approach to train multi-modal LLM (MLLM) to generate HTML code from webpage screenshots or UI designs. WAFFLE uses a structure-aware attention mechanism to improve MLLMs' understanding of HTML's structure and a contrastive fine-tuning approach to align MLLMs' understanding of UI images and HTML code. Models fine-tuned with WAFFLE show up to 9.00 pp (percentage point) higher HTML match, 0.0982 higher CW-SSIM, 32.99 higher CLIP, and 27.12 pp higher LLEM on our new benchmark WebSight-Test and an existing benchmark Design2Code.
+## Updates:
+* 10/24/2024: Our preprint avaiable at: [preprint](https://arxiv.org/abs/2410.18362)
+* 10/24/2024: Our code (keep maintaining) avaiable at: [code](https://github.com/lt-asset/Waffle)
+* 10/24/2024: Our fine-tuned Waffle_VLM_WebSight (7B), using DoRA, is released at: [lt-asset/Waffle_VLM_WebSight](https://huggingface.co/lt-asset/Waffle_VLM_WebSight)
+## Dependency
+- peft               0.11.1
+- transformers       4.41.1
+- pytorch       2.3.0
+- selenium
+- Python 3.10.14
+- deepspeed          0.14.1
+- datasets 2.19.1
+- beautifulsoup4     4.12.3
+- accelerate         0.30.1
+## Structure
+- `vlm_websight` contains the dataset class file, model class files, and training file for vlm_websight.
+    - `eval_websight.py` is the inference file
+    - `dataset.py` is the dataset class file
+- WebSight-Test is one of our test dataset
+## Quick Start
+```bash
+cd vlm_websight
+# generate HTML/CSS code for UI image --image_path, save the code to --html_path
+python quick_start.py --image_path ../WebSight-Test/test-495.png --html_path examples/example-495.html
+# render the HTML/CSS code in --html_path, and save the rendered image to --image_path
+python render_html.py --html_path examples/example-495.html --image_path examples/example-495.png
+```
+## Example
+* Input UI design
+![test-495.png](WebSight-Test/test-495.png)
+* Waffle-VLM-WebSight generated HTML code
+[example-495.html](vlm_websight/examples/example-495.html)
+* Rendered Waffle-VLM-WebSight output
+![example-495.html](vlm_websight/examples/example-495.png)
+## Citation
+```
+@misc{liang2024wafflemultimodalmodelautomated,
+      title={WAFFLE: Multi-Modal Model for Automated Front-End Development},
+      author={Shanchao Liang and Nan Jiang and Shangshu Qian and Lin Tan},
+      year={2024},
+      eprint={2410.18362},
+      archivePrefix={arXiv},
+      primaryClass={cs.SE},
+      url={https://arxiv.org/abs/2410.18362},
+}
+```