thu-ml
/

unidiffuser-v0

UniDiffuserPipeline

image-captioning

image-variation

generative model

Model card Files Files and versions Community

baofff commited on Mar 12, 2023

Commit

1e22060

•

1 Parent(s): 973aa86

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -10,11 +10,12 @@ tags:
 - generative model
 ---
-UniDiffuser is a multi-modal diffusion model with a transformer-based backbone ([U-ViT](https://github.com/baofff/U-ViT)). UniDiffuser is able to perform image, text, text-to-image, image-to-text, and image-text pair generation by setting proper timesteps without additional overhead.
-The main component of UniDiffuser is [U-ViT](https://github.com/baofff/U-ViT), which parameterizes the joint noise prediction network. Other components perform as encoders and decoders of different modalities, including a pretrained image autoencoder from [Stable Diffusion](https://github.com/CompVis/stable-diffusion), a pretrained [image ViT-B/32 CLIP encoder](https://github.com/openai/CLIP), a pretrained [text ViT-L CLIP encoder](https://huggingface.co/openai/clip-vit-large-patch14), and a [GPT-2](https://github.com/openai/gpt-2) text decoder finetuned by ourselves.
 We provide two versions of UniDiffuser:

 - generative model
 ---
+UniDiffuser is a unified diffusion framework to fit all distributions relevant to a set of multi-modal data in one transformer.
+UniDiffuser is able to perform image, text, text-to-image, image-to-text, and image-text pair generation by setting proper timesteps without additional overhead.
+Specifically, UniDiffuser employs a variation of transformer, called [U-ViT](https://github.com/baofff/U-ViT), which parameterizes the joint noise prediction network. Other components perform as encoders and decoders of different modalities, including a pretrained image autoencoder from [Stable Diffusion](https://github.com/CompVis/stable-diffusion), a pretrained [image ViT-B/32 CLIP encoder](https://github.com/openai/CLIP), a pretrained [text ViT-L CLIP encoder](https://huggingface.co/openai/clip-vit-large-patch14), and a [GPT-2](https://github.com/openai/gpt-2) text decoder finetuned by ourselves.
 We provide two versions of UniDiffuser: