IrohXu
/

stable-diffusion-3-inpainting

Model card Files Files and versions Community

Xu Cao commited on Jun 28

Commit

9dc6d7f

•

1 Parent(s): 3c12e7b

update README.md

Files changed (1) hide show

README.md +22 -3

README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# Stable Diffusion 3 Inpaint Pipeline
 | input image | input mask image | output |
 |:-------------------------:|:-------------------------:|:-------------------------:|
@@ -8,7 +8,27 @@
 **Please ensure that the version of diffusers >= 0.29.1**
-# Demo
 ```python
 import torch
 from torchvision import transforms
@@ -20,7 +40,6 @@ def preprocess_image(image):
     image = image.convert("RGB")
     image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
     image = transforms.ToTensor()(image)
-    image = image * 2 - 1
     image = image.unsqueeze(0).to("cuda")
     return image

+# Stable Diffusion 3 Inpainting Pipeline
 | input image | input mask image | output |
 |:-------------------------:|:-------------------------:|:-------------------------:|
 **Please ensure that the version of diffusers >= 0.29.1**
+## Model
+[Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
+For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
+Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or [contact us](https://stability.ai/license) for commercial licensing details.
+### Model Description
+- **Developed by:** Stability AI
+- **Model type:** MMDiT text-to-image generative model
+- **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer
+(https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
+([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
+## Demo
+Make sure you upgrade to the latest version of diffusers: pip install -U diffusers. And then you can run:
 ```python
 import torch
 from torchvision import transforms
     image = image.convert("RGB")
     image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
     image = transforms.ToTensor()(image)
     image = image.unsqueeze(0).to("cuda")
     return image