###
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
--- This is the official model weights of the model ''Edgen'' trained by EvolveDirector. For more datails, please refer to our paper and code repo. ## Setup ### Requirements 1. Build virtual environment for EvolveDirector ```shell # create virtual environment for EvolveDirector conda create -n evolvedirector python=3.9 conda activate evolvedirector # cd to the path of this repo # install packages pip install --upgrade pip pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121 pip install -r requirements.txt pip install -U transformers accelerate diffusers SentencePiece ftfy beautifulsoup4 ``` ## Usage 1. Inference ```shell python Inference/inference.py --image_size=1024 \ --t5_path "./model" \ --tokenizer_path "./model/sd-vae-ft-ema" \ --txt_file "text_prompts.txt" \ # put your text prompts in this file --model_path "model/Edgen_1024px_v1.pth" \ --save_folder "output/test_model" ``` ## Citation ```bibtex @article{zhao2024evolvedirector, title={EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models}, author={Zhao, Rui and Yuan, Hangjie and Wei, Yujie and Zhang, Shiwei and Gu, Yuchao and Ran, Lingmin and Wang, Xiang and Wu, Zhangjie and Zhang, Junhao and Zhang, Yingya and others}, journal={arXiv preprint arXiv:2410.07133}, year={2024} } ``` ## Shoutouts - This code builds heavily on [PixArt-$\alpha$](https://github.com/PixArt-alpha/PixArt-alpha/). Thanks for open-sourcing!