EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
--- This is the official model weights of the model ''Edgen'' trained by EvolveDirector. For more datails, please refer to our paper and code repo. ## Setup ### Requirements 1. Build virtual environment for EvolveDirector ```shell # create virtual environment for EvolveDirector conda create -n evolvedirector python=3.9 conda activate evolvedirector # cd to the path of this repo # install packages pip install --upgrade pip pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121 pip install -r requirements.txt pip install -U transformers accelerate diffusers SentencePiece ftfy beautifulsoup4 ``` ## Usage 1. Inference ```shell python Inference/inference.py --image_size=1024 \ --t5_path "./model" \ --tokenizer_path "./model/sd-vae-ft-ema" \ --txt_file "text_prompts.txt" \ # put your text prompts in this file --model_path "model/Edgen_1024px_v1.pth" \ --save_folder "output/test_model" ``` ## Citation ```bibtex @misc{zhao2024evolvedirectorapproachingadvancedtexttoimage, title={EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models}, author={Rui Zhao and Hangjie Yuan and Yujie Wei and Shiwei Zhang and Yuchao Gu and Lingmin Ran and Xiang Wang and Zhangjie Wu and Junhao Zhang and Yingya Zhang and Mike Zheng Shou}, year={2024}, eprint={2410.07133}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.07133}, } ``` ## Shoutouts - This code builds heavily on [PixArt-$\alpha$](https://github.com/PixArt-alpha/PixArt-alpha/). Thanks for open-sourcing!