chaojiemao commited on
Commit
10556e4
β€’
1 Parent(s): c57b5a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -52
README.md CHANGED
@@ -1,52 +1,134 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- tags:
6
- - Diffusion Transformer
7
- - Image Editing
8
- - Scepter
9
- - ACE
10
- ---
11
- <h2 align="center">
12
- ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
13
- </h2>
14
-
15
- <h3 align="center">
16
- <b>Tongyi Lab, Alibaba Group</b>
17
- </h3>
18
-
19
- <div align="center">
20
-
21
- [**Paper**](https://arxiv.org/abs/2410.00086) **|** [**Project Page**](https://ali-vilab.github.io/ace-page/) **|** [**Code**](https://github.com/ali-vilab/ACE)
22
-
23
- </div>
24
-
25
-
26
- ACE is a unified foundational model framework that supports a wide range of visual generation tasks.
27
- By defining CU for unifying multi-modal inputs across different tasks and incorporating long-context CU,
28
- we introduce historical contextual information into visual generation tasks, paving
29
- the way for ChatGPT-like dialog systems in visual generation.
30
-
31
- <p>
32
- <table align="center">
33
- <tr>
34
- <td>
35
- <img src="assets/figures/teaser.png">
36
- </td>
37
- </tr>
38
- </table>
39
- </p>
40
-
41
-
42
-
43
- ## BibTeX
44
-
45
- ```bibtex
46
- @article{han2024ace,
47
- title={ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer},
48
- author={Han, Zhen and Jiang, Zeyinzi and Pan, Yulin and Zhang, Jingfeng and Mao, Chaojie and Xie, Chenwei and Liu, Yu and Zhou, Jingren},
49
- journal={arXiv preprint arXiv:2410.00086},
50
- year={2024}
51
- }
52
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - Diffusion Transformer
7
+ - Image Editing
8
+ - Scepter
9
+ - ACE
10
+ ---
11
+ <h2 align="center">
12
+ ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
13
+ </h2>
14
+
15
+ <h3 align="center">
16
+ <b>Tongyi Lab, Alibaba Group</b>
17
+ </h3>
18
+
19
+ <div align="center">
20
+
21
+ [**Paper**](https://arxiv.org/abs/2410.00086) **|** [**Project Page**](https://ali-vilab.github.io/ace-page/) **|** [**Code**](https://github.com/ali-vilab/ACE)
22
+
23
+ </div>
24
+
25
+
26
+ ACE is a unified foundational model framework that supports a wide range of visual generation tasks.
27
+ By defining CU for unifying multi-modal inputs across different tasks and incorporating long-context CU,
28
+ we introduce historical contextual information into visual generation tasks, paving
29
+ the way for ChatGPT-like dialog systems in visual generation.
30
+
31
+ <p>
32
+ <table align="center">
33
+ <tr>
34
+ <td>
35
+ <img src="assets/figures/teaser.png">
36
+ </td>
37
+ </tr>
38
+ </table>
39
+ </p>
40
+
41
+
42
+ ## πŸ“’ News
43
+ * **[2024.9.30]** Release the paper of ACE on arxiv.
44
+ * **[2024.10.31]** Release the ACE checkpoint on [ModelScope](https://www.modelscope.cn/models/iic/ACE-0.6B-512px) and [HuggingFace](https://huggingface.co/scepter-studio/ACE-0.6B-512px).
45
+ * **[2024.11.1]** Support online demo on [HuggingFace](https://huggingface.co/spaces/scepter-studio/ACE-Chat).
46
+ * **[2024.11.20]** Release the [ACE-0.6b-1024px](https://huggingface.co/scepter-studio/ACE-0.6B-1024px) model,
47
+ which significantly enhances image generation quality compared with [ACE-0.6b-512px](https://huggingface.co/scepter-studio/ACE-0.6B-512px).
48
+
49
+
50
+ ## πŸš€ Installation
51
+ Install the necessary packages with `pip`:
52
+ ```bash
53
+ git clone https://github.com/ali-vilab/ACE.git
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ ## πŸ”₯ ACE Models
58
+ | **Model** | **Status** |
59
+ |:----------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
60
+ | ACE-0.6B-512px | [![Demo link](https://img.shields.io/badge/Demo-ACE_Chat-purple)](https://huggingface.co/spaces/scepter-studio/ACE-Chat)<br>[![ModelScope link](https://img.shields.io/badge/ModelScope-Model-blue)](https://www.modelscope.cn/models/iic/ACE-0.6B-512px) [![HuggingFace link](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow)](https://huggingface.co/scepter-studio/ACE-0.6B-512px) |
61
+ | ACE-0.6B-1024px | [![Demo link](https://img.shields.io/badge/Demo-ACE_Refiner_Chat-purple)](https://huggingface.co/spaces/scepter-studio/ACE-Refiner-Chat)<br>[![ModelScope link](https://img.shields.io/badge/ModelScope-Model-blue)](https://www.modelscope.cn/models/iic/ACE-0.6B-1024px) [![HuggingFace link](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow)](https://huggingface.co/scepter-studio/ACE-0.6B-1024px) | |
62
+ | ACE-12B-FLUX-dev | Coming Soon |
63
+
64
+ ## πŸ–Ό Model Performance Visualization
65
+
66
+ The current model's parameters scale of ACE is 0.6B, which imposes certain limitations on the quality of image generation. [FLUX.1-Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev), on the other hand,
67
+ has a significant advantage in text-to-image generation quality. By using SDEdit, we can effectively leverage the generative capabilities of FLUX to further enhance the image results generated by ACE. Based on the above considerations, we have designed the ACE-Refiner pipeline, as shown in the diagram below.
68
+
69
+ ![ACE_REFINER](assets/ace_method/ace_refiner_process.webp)
70
+
71
+ As shown in the figure below, when the strength
72
+ Οƒ of the generated image is high, the generated image will suffer from fidelity loss compared to the original image. Conversely, lower
73
+ Οƒ does not significantly improve the image quality. Therefore, users can make a trade-off between fidelity to the generated result and the image quality based on their own needs.
74
+ Users can set the value of "REFINER_SCALE" in the configuration file `config/inference_config/models/ace_0.6b_1024_refiner.yaml`.
75
+ We recommend that users use the advance options in the [webui-demo](#-chat-bot-) for effect verification.
76
+
77
+ ![ACE_REFINER_EXAMPLE](assets/ace_method/ace_refiner.webp)
78
+
79
+
80
+ We compared the generation and editing performance of different models on several tasks, as shown as following.
81
+ ![Samples](assets/ace_method/samples_compare.webp)
82
+
83
+
84
+ ## πŸ”₯ Training
85
+
86
+ We offer a demonstration training YAML that enables the end-to-end training of ACE using a toy dataset. For a comprehensive overview of the hyperparameter configurations, please consult `config/ace_0.6b_512_train.yaml`.
87
+
88
+ ### Prepare datasets
89
+
90
+ Please find the dataset class located in `modules/data/dataset/dataset.py`,
91
+ designed to facilitate end-to-end training using an open-source toy dataset.
92
+ Download a dataset zip file from [modelscope](https://www.modelscope.cn/models/iic/scepter/resolve/master/datasets/hed_pair.zip), and then extract its contents into the `cache/datasets/` directory.
93
+
94
+ Should you wish to prepare your own datasets, we recommend consulting `modules/data/dataset/dataset.py` for detailed guidance on the required data format.
95
+
96
+ ### Prepare initial weight
97
+ The ACE checkpoint has been uploaded to both ModelScope and HuggingFace platforms:
98
+ * [ModelScope](https://www.modelscope.cn/models/iic/ACE-0.6B-512px)
99
+ * [HuggingFace](https://huggingface.co/scepter-studio/ACE-0.6B-512px)
100
+
101
+ In the provided training YAML configuration, we have designated the Modelscope URL as the default checkpoint URL. Should you wish to transition to Hugging Face, you can effortlessly achieve this by modifying the PRETRAINED_MODEL value within the YAML file (replace the prefix "ms://iic" to "hf://scepter-studio").
102
+
103
+
104
+ ### Start training
105
+
106
+ You can easily start training procedure by executing the following command:
107
+ ```bash
108
+ # ACE-0.6B-512px
109
+ PYTHONPATH=. python tools/run_train.py --cfg config/ace_0.6b_512_train.yaml
110
+ # ACE-0.6B-1024px
111
+ PYTHONPATH=. python tools/run_train.py --cfg config/ace_0.6b_1024_train.yaml
112
+ ```
113
+
114
+ ## πŸš€ Inference
115
+
116
+ We provide a simple inference demo that allows users to generate images from text descriptions.
117
+ ```bash
118
+ PYTHONPATH=. python tools/run_inference.py --cfg config/inference_config/models/ace_0.6b_512.yaml --instruction "make the boy cry, his eyes filled with tears" --seed 199999 --input_image examples/input_images/example0.webp
119
+ ```
120
+ We recommend runing the examples for quick testing. Running the following command will run the example inference and the results will be saved in `examples/output_images/`.
121
+ ```bash
122
+ PYTHONPATH=. python tools/run_inference.py --cfg config/inference_config/models/ace_0.6b_512.yaml
123
+ ```
124
+
125
+ ## πŸ“ Citation
126
+
127
+ ```bibtex
128
+ @article{han2024ace,
129
+ title={ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer},
130
+ author={Han, Zhen and Jiang, Zeyinzi and Pan, Yulin and Zhang, Jingfeng and Mao, Chaojie and Xie, Chenwei and Liu, Yu and Zhou, Jingren},
131
+ journal={arXiv preprint arXiv:2410.00086},
132
+ year={2024}
133
+ }
134
+ ```