mrtuandao commited on
Commit
ab4818d
1 Parent(s): 8ef0ca3

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,27 +1,35 @@
1
  ---
2
- tasks:
3
- - text-to-image-synthesis
4
- model-type:
5
- - stable_diffusion
6
- domain:
7
- - mm
8
- frameworks:
9
- - pytorch
10
- customized-quickstart: False
11
- finetune-support: False
12
  license: creativeml-openrail-m
13
- language:
14
- - cn
15
- - en
16
  tags:
17
  - stable-diffusion
18
  - stable-diffusion-diffusers
19
  - text-to-image
20
- - zh
21
- - Chinese
22
-
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ---
24
 
 
 
 
 
 
 
 
 
25
  # Stable Diffusion v1-5 Model Card
26
 
27
  Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
@@ -30,35 +38,17 @@ For more information about how Stable Diffusion functions, please have a look at
30
  The **Stable-Diffusion-v1-5** checkpoint was initialized with the weights of the [Stable-Diffusion-v1-2](https:/steps/huggingface.co/CompVis/stable-diffusion-v1-2)
31
  checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
32
 
33
- You can use this both with the [🧨Diffusers library](https://github.com/huggingface/diffusers) and the [RunwayML GitHub repository](https://github.com/runwayml/stable-diffusion).
34
-
35
-
36
- ### modelscope usage
37
-
38
- ```python
39
- from modelscope.utils.constant import Tasks
40
- from modelscope.pipelines import pipeline
41
- import cv2
42
-
43
- pipe = pipeline(task=Tasks.text_to_image_synthesis,
44
- model='AI-ModelScope/stable-diffusion-v1-5',
45
- model_revision='v1.0.0')
46
-
47
- prompt = '飞流直下三千尺,油画'
48
- output = pipe({'text': prompt})
49
- cv2.imwrite('result.png', output['output_imgs'][0])
50
-
51
- ```
52
-
53
-
54
 
55
  ### Diffusers usage
56
  ```py
57
  from diffusers import StableDiffusionPipeline
58
  import torch
59
 
60
- model_id = "runwayml/stable-diffusion-v1-5"
61
- pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
 
 
62
  pipe = pipe.to("cuda")
63
 
64
  prompt = "a photo of an astronaut riding a horse on mars"
@@ -66,15 +56,9 @@ image = pipe(prompt).images[0]
66
 
67
  image.save("astronaut_rides_horse.png")
68
  ```
69
- For more detailed instructions, use-cases and examples in JAX follow the instructions [here](https://github.com/huggingface/diffusers#text-to-image-generation-with-stable-diffusion)
70
-
71
- ### Original GitHub Repository
72
 
73
- 1. Download the weights
74
- - [v1-5-pruned-emaonly.ckpt](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt) - 4.27GB, ema-only weight. uses less VRAM - suitable for inference
75
- - [v1-5-pruned.ckpt](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned.ckpt) - 7.7GB, ema+non-ema weights. uses more VRAM - suitable for fine-tuning
76
 
77
- 2. Follow instructions [here](https://github.com/runwayml/stable-diffusion).
78
 
79
  ## Model Details
80
  - **Developed by:** Robin Rombach, Patrick Esser
@@ -163,7 +147,6 @@ The concepts are intentionally hidden to reduce the likelihood of reverse-engine
163
  Specifically, the checker compares the class probability of harmful concepts in the embedding space of the `CLIPTextModel` *after generation* of the images.
164
  The concepts are passed into the model with the generated image and compared to a hand-engineered weight for each NSFW concept.
165
 
166
-
167
  ## Training
168
 
169
  **Training Data**
@@ -187,8 +170,8 @@ Currently six Stable Diffusion checkpoints are provided, which were trained as f
187
  filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
188
  - [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2` - 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
189
  - [`stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) Resumed from `stable-diffusion-v1-2` - 225,000 steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
190
- - [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) Resumed from `stable-diffusion-v1-2` - 595,000 steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
191
- - [`stable-diffusion-inpainting`](https://huggingface.co/runwayml/stable-diffusion-inpainting) Resumed from `stable-diffusion-v1-5` - then 440,000 steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.
192
 
193
  - **Hardware:** 32 x 8 x A100 GPUs
194
  - **Optimizer:** AdamW
 
1
  ---
 
 
 
 
 
 
 
 
 
 
2
  license: creativeml-openrail-m
 
 
 
3
  tags:
4
  - stable-diffusion
5
  - stable-diffusion-diffusers
6
  - text-to-image
7
+ inference: false
8
+ library_name: diffusers
9
+ extra_gated_prompt: |-
10
+ One more step before getting this model.
11
+ This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
12
+ The CreativeML OpenRAIL License specifies:
13
+
14
+ 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content
15
+ 2. CompVis claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
16
+ 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
17
+ Please read the full license here: https://huggingface.co/spaces/CompVis/stable-diffusion-license
18
+
19
+ By clicking on "Access repository" below, you accept that your *contact information* (email address and username) can be shared with the model authors as well.
20
+
21
+ extra_gated_fields:
22
+ I have read the License and agree with its terms: checkbox
23
  ---
24
 
25
+ # Re-upload
26
+
27
+ This repository is being re-uploaded to HuggingFace in accordance with [The CreativeML OpenRAIL-M License](https://huggingface.co/spaces/CompVis/stable-diffusion-license) under which this repository was originally uploaded, specifically **Section II** which grants:
28
+
29
+ > ...a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.
30
+
31
+ Note that these files did not come from HuggingFace, but instead from [modelscope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5). As such, some files that were present in the original repository may not be present. File integrity has been verified via checksum.
32
+
33
  # Stable Diffusion v1-5 Model Card
34
 
35
  Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
 
38
  The **Stable-Diffusion-v1-5** checkpoint was initialized with the weights of the [Stable-Diffusion-v1-2](https:/steps/huggingface.co/CompVis/stable-diffusion-v1-2)
39
  checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
40
 
41
+ You can use this with the [🧨Diffusers library](https://github.com/huggingface/diffusers).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ### Diffusers usage
44
  ```py
45
  from diffusers import StableDiffusionPipeline
46
  import torch
47
 
48
+ pipe = StableDiffusionPipeline.from_pretrained(
49
+ "benjamin-paine/stable-diffusion-v1-5",
50
+ torch_dtype=torch.float16
51
+ )
52
  pipe = pipe.to("cuda")
53
 
54
  prompt = "a photo of an astronaut riding a horse on mars"
 
56
 
57
  image.save("astronaut_rides_horse.png")
58
  ```
 
 
 
59
 
60
+ For more detailed instructions, use-cases and examples in JAX follow the instructions [here](https://github.com/huggingface/diffusers#text-to-image-generation-with-stable-diffusion)
 
 
61
 
 
62
 
63
  ## Model Details
64
  - **Developed by:** Robin Rombach, Patrick Esser
 
147
  Specifically, the checker compares the class probability of harmful concepts in the embedding space of the `CLIPTextModel` *after generation* of the images.
148
  The concepts are passed into the model with the generated image and compared to a hand-engineered weight for each NSFW concept.
149
 
 
150
  ## Training
151
 
152
  **Training Data**
 
170
  filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
171
  - [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2` - 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
172
  - [`stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) Resumed from `stable-diffusion-v1-2` - 225,000 steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
173
+ - [`stable-diffusion-v1-5`](https://huggingface.co/benjamin-paine/stable-diffusion-v1-5) Resumed from `stable-diffusion-v1-2` - 595,000 steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
174
+ - [`stable-diffusion-inpainting`](https://huggingface.co/benjamin-paine/stable-diffusion-inpainting) Resumed from `stable-diffusion-v1-5` - then 440,000 steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.
175
 
176
  - **Hardware:** 32 x 8 x A100 GPUs
177
  - **Optimizer:** AdamW
safety_checker/model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ab18aa51a16086ed7588220e4e0c113069f66937440010f29d470b5fd476be9
3
+ size 608016874
safety_checker/pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94e5774e81a871d658982a10e36cb33df9d5bf8a437cc194cc3a754a0fb2cc81
3
+ size 608102483
text_encoder/model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f218f1ad781c14b425029ad2fddfe30fdb21a602ebf50a137892c38c1fdac3b
3
+ size 246144378
text_encoder/pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e4eef3290f9ac8c9bb9a9f9b0a570f8234f68b9c41a48f5f5d9ba493b2cdb3e
3
+ size 246187019
unet/diffusion_pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc59c7ed3d78a30c0b462fabe0ea9f34b5995609ced7748ad58ea5e96d852779
3
+ size 1719328368
unet/diffusion_pytorch_model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:628d8d51f9278641c7d7e0cb4b7987f2a5ea84dbd99ac204c7c95fa7a4bcd999
3
+ size 1719125272
v1-5-pruned-emaonly.fp16.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f30f876d4741a0a53d08a0b58d4488c80fd61268115899eac92e9ed8e9afbdd
3
+ size 4265400990
v1-5-pruned-emaonly.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92954befdb6aacf52f86095eba54ac9262459bc21f987c0e51350f3679c4e45a
3
+ size 2132650708
v1-5-pruned.fp16.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5a69c37200d61672c8ba3161a1fa675a42510bfdfd97424379de737f6244dab
3
+ size 7703824434
v1-5-pruned.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:009eed2ef17d951bea4f7af1c207eb11b1572c9a6fc093a89c63ef2cab94ef7d
3
+ size 3851786764
vae/diffusion_pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:448aecf43f5c42d73da487cddf36b659b2ba5988179a557818e6641c87f5eb4f
3
+ size 167405870
vae/diffusion_pytorch_model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a26b615f1542b39cb0b0ce7847d0885bd682abcbadc1f0f8a1c82220f533aea
3
+ size 167335318