Xu Cao commited on
Commit
9dc6d7f
1 Parent(s): 3c12e7b

update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -3
README.md CHANGED
@@ -1,4 +1,4 @@
1
- # Stable Diffusion 3 Inpaint Pipeline
2
 
3
  | input image | input mask image | output |
4
  |:-------------------------:|:-------------------------:|:-------------------------:|
@@ -8,7 +8,27 @@
8
 
9
  **Please ensure that the version of diffusers >= 0.29.1**
10
 
11
- # Demo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ```python
13
  import torch
14
  from torchvision import transforms
@@ -20,7 +40,6 @@ def preprocess_image(image):
20
  image = image.convert("RGB")
21
  image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
22
  image = transforms.ToTensor()(image)
23
- image = image * 2 - 1
24
  image = image.unsqueeze(0).to("cuda")
25
  return image
26
 
 
1
+ # Stable Diffusion 3 Inpainting Pipeline
2
 
3
  | input image | input mask image | output |
4
  |:-------------------------:|:-------------------------:|:-------------------------:|
 
8
 
9
  **Please ensure that the version of diffusers >= 0.29.1**
10
 
11
+ ## Model
12
+
13
+ [Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
14
+
15
+ For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
16
+
17
+ Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or [contact us](https://stability.ai/license) for commercial licensing details.
18
+
19
+
20
+ ### Model Description
21
+
22
+ - **Developed by:** Stability AI
23
+ - **Model type:** MMDiT text-to-image generative model
24
+ - **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer
25
+ (https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
26
+ ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
27
+
28
+ ## Demo
29
+
30
+ Make sure you upgrade to the latest version of diffusers: pip install -U diffusers. And then you can run:
31
+
32
  ```python
33
  import torch
34
  from torchvision import transforms
 
40
  image = image.convert("RGB")
41
  image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
42
  image = transforms.ToTensor()(image)
 
43
  image = image.unsqueeze(0).to("cuda")
44
  return image
45