LMD+ Model Card
Paper | Project Page | 5-minute Blog Post | Demo | Code | Citation | Related work: LLM-grounded Video Diffusion Models
LMD and LMD+ greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. It improves spatial reasoning, the understanding of negation, attribute binding, generative numeracy, etc. in a unified manner without explicitly aiming for each. LMD is completely training-free (i.e., uses SD model off-the-shelf). LMD+ takes in additional adapters for better control. This is a reproduction of LMD+ model used in our work. Our full codebase is at here.
This LMD+ model is based on Stable Diffusion v1.4 and integrates the adapters trained with GLIGEN. The model can be directly used with our LLMGroundedDiffusionPipeline
, which is a simplified pipeline of LMD+ without per-box generation.
See the original SD Model Card here.
Cite our work
@article{lian2023llmgrounded,
title={LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models},
author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor},
journal={arXiv preprint arXiv:2305.13655},
year={2023}
}
- Downloads last month
- 475