Xu Cao
commited on
Commit
•
9dc6d7f
1
Parent(s):
3c12e7b
update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
# Stable Diffusion 3
|
2 |
|
3 |
| input image | input mask image | output |
|
4 |
|:-------------------------:|:-------------------------:|:-------------------------:|
|
@@ -8,7 +8,27 @@
|
|
8 |
|
9 |
**Please ensure that the version of diffusers >= 0.29.1**
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
```python
|
13 |
import torch
|
14 |
from torchvision import transforms
|
@@ -20,7 +40,6 @@ def preprocess_image(image):
|
|
20 |
image = image.convert("RGB")
|
21 |
image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
|
22 |
image = transforms.ToTensor()(image)
|
23 |
-
image = image * 2 - 1
|
24 |
image = image.unsqueeze(0).to("cuda")
|
25 |
return image
|
26 |
|
|
|
1 |
+
# Stable Diffusion 3 Inpainting Pipeline
|
2 |
|
3 |
| input image | input mask image | output |
|
4 |
|:-------------------------:|:-------------------------:|:-------------------------:|
|
|
|
8 |
|
9 |
**Please ensure that the version of diffusers >= 0.29.1**
|
10 |
|
11 |
+
## Model
|
12 |
+
|
13 |
+
[Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
|
14 |
+
|
15 |
+
For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
|
16 |
+
|
17 |
+
Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or [contact us](https://stability.ai/license) for commercial licensing details.
|
18 |
+
|
19 |
+
|
20 |
+
### Model Description
|
21 |
+
|
22 |
+
- **Developed by:** Stability AI
|
23 |
+
- **Model type:** MMDiT text-to-image generative model
|
24 |
+
- **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer
|
25 |
+
(https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
|
26 |
+
([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
|
27 |
+
|
28 |
+
## Demo
|
29 |
+
|
30 |
+
Make sure you upgrade to the latest version of diffusers: pip install -U diffusers. And then you can run:
|
31 |
+
|
32 |
```python
|
33 |
import torch
|
34 |
from torchvision import transforms
|
|
|
40 |
image = image.convert("RGB")
|
41 |
image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
|
42 |
image = transforms.ToTensor()(image)
|
|
|
43 |
image = image.unsqueeze(0).to("cuda")
|
44 |
return image
|
45 |
|