|
---
|
|
license: apache-2.0
|
|
language:
|
|
- en
|
|
tags:
|
|
- Diffusion Transformer
|
|
- Image Editing
|
|
- Scepter
|
|
- ACE
|
|
---
|
|
<h2 align="center">
|
|
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
|
|
</h2>
|
|
|
|
<h3 align="center">
|
|
<b>Tongyi Lab, Alibaba Group</b>
|
|
</h3>
|
|
|
|
<div align="center">
|
|
|
|
[**Paper**](https://arxiv.org/abs/2410.00086) **|** [**Project Page**](https://ali-vilab.github.io/ace-page/) **|** [**Code**](https://github.com/ali-vilab/ACE)
|
|
|
|
</div>
|
|
|
|
|
|
ACE is a unified foundational model framework that supports a wide range of visual generation tasks.
|
|
By defining CU for unifying multi-modal inputs across different tasks and incorporating long-context CU,
|
|
we introduce historical contextual information into visual generation tasks, paving
|
|
the way for ChatGPT-like dialog systems in visual generation.
|
|
|
|
<p>
|
|
<table align="center">
|
|
<tr>
|
|
<td>
|
|
<img src="assets/figures/teaser.png">
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</p>
|
|
|
|
|
|
|
|
## BibTeX
|
|
|
|
```bibtex
|
|
@article{han2024ace,
|
|
title={ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer},
|
|
author={Han, Zhen and Jiang, Zeyinzi and Pan, Yulin and Zhang, Jingfeng and Mao, Chaojie and Xie, Chenwei and Liu, Yu and Zhou, Jingren},
|
|
journal={arXiv preprint arXiv:2410.00086},
|
|
year={2024}
|
|
}
|
|
```
|
|
|