Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

README.md +32 -49
safety_checker/model.fp16.safetensors +3 -0
safety_checker/pytorch_model.fp16.bin +3 -0
text_encoder/model.fp16.safetensors +3 -0
text_encoder/pytorch_model.fp16.bin +3 -0
unet/diffusion_pytorch_model.fp16.bin +3 -0
unet/diffusion_pytorch_model.fp16.safetensors +3 -0
v1-5-pruned-emaonly.fp16.ckpt +3 -0
v1-5-pruned-emaonly.fp16.safetensors +3 -0
v1-5-pruned.fp16.ckpt +3 -0
v1-5-pruned.fp16.safetensors +3 -0
vae/diffusion_pytorch_model.fp16.bin +3 -0
vae/diffusion_pytorch_model.fp16.safetensors +3 -0

README.md CHANGED Viewed

@@ -1,27 +1,35 @@
 ---
-tasks:
-- text-to-image-synthesis
-model-type:
-- stable_diffusion
-domain:
-- mm
-frameworks:
-- pytorch
-customized-quickstart: False
-finetune-support: False
 license: creativeml-openrail-m
-language:
-- cn
-- en
 tags:
 - stable-diffusion
 - stable-diffusion-diffusers
 - text-to-image
-- zh
-- Chinese
 ---
 # Stable Diffusion v1-5 Model Card
 Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
@@ -30,35 +38,17 @@ For more information about how Stable Diffusion functions, please have a look at
 The **Stable-Diffusion-v1-5** checkpoint was initialized with the weights of the [Stable-Diffusion-v1-2](https:/steps/huggingface.co/CompVis/stable-diffusion-v1-2)
 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
-You can use this both with the [🧨Diffusers library](https://github.com/huggingface/diffusers) and the [RunwayML GitHub repository](https://github.com/runwayml/stable-diffusion).
-### modelscope usage
-```python
-from modelscope.utils.constant import Tasks
-from modelscope.pipelines import pipeline
-import cv2
-pipe = pipeline(task=Tasks.text_to_image_synthesis,
-                model='AI-ModelScope/stable-diffusion-v1-5',
-                model_revision='v1.0.0')
-prompt = '飞流直下三千尺，油画'
-output = pipe({'text': prompt})
-cv2.imwrite('result.png', output['output_imgs'][0])
-```
 ### Diffusers usage
 ```py
 from diffusers import StableDiffusionPipeline
 import torch
-model_id = "runwayml/stable-diffusion-v1-5"
-pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
 pipe = pipe.to("cuda")
 prompt = "a photo of an astronaut riding a horse on mars"
@@ -66,15 +56,9 @@ image = pipe(prompt).images[0]
 image.save("astronaut_rides_horse.png")
 ```
-For more detailed instructions, use-cases and examples in JAX follow the instructions [here](https://github.com/huggingface/diffusers#text-to-image-generation-with-stable-diffusion)
-### Original GitHub Repository
-1. Download the weights
-   - [v1-5-pruned-emaonly.ckpt](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt) - 4.27GB, ema-only weight. uses less VRAM - suitable for inference
-   - [v1-5-pruned.ckpt](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned.ckpt) - 7.7GB, ema+non-ema weights. uses more VRAM - suitable for fine-tuning
-2. Follow instructions [here](https://github.com/runwayml/stable-diffusion).
 ## Model Details
 - **Developed by:** Robin Rombach, Patrick Esser
@@ -163,7 +147,6 @@ The concepts are intentionally hidden to reduce the likelihood of reverse-engine
 Specifically, the checker compares the class probability of harmful concepts in the embedding space of the `CLIPTextModel` *after generation* of the images.
 The concepts are passed into the model with the generated image and compared to a hand-engineered weight for each NSFW concept.
 ## Training
 **Training Data**
@@ -187,8 +170,8 @@ Currently six Stable Diffusion checkpoints are provided, which were trained as f
 filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
 - [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2` - 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
 - [`stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) Resumed from `stable-diffusion-v1-2` - 225,000 steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
-- [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) Resumed from `stable-diffusion-v1-2` - 595,000 steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
-- [`stable-diffusion-inpainting`](https://huggingface.co/runwayml/stable-diffusion-inpainting) Resumed from `stable-diffusion-v1-5` - then 440,000 steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.
 - **Hardware:** 32 x 8 x A100 GPUs
 - **Optimizer:** AdamW

 ---
 license: creativeml-openrail-m
 tags:
 - stable-diffusion
 - stable-diffusion-diffusers
 - text-to-image
+inference: false
+library_name: diffusers
+extra_gated_prompt: |-
+  One more step before getting this model.
+  This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
+  The CreativeML OpenRAIL License specifies:
+  1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content
+  2. CompVis claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
+  3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
+  Please read the full license here: https://huggingface.co/spaces/CompVis/stable-diffusion-license
+  By clicking on "Access repository" below, you accept that your *contact information* (email address and username) can be shared with the model authors as well.
+extra_gated_fields:
+ I have read the License and agree with its terms: checkbox
 ---
+# Re-upload
+This repository is being re-uploaded to HuggingFace in accordance with [The CreativeML OpenRAIL-M License](https://huggingface.co/spaces/CompVis/stable-diffusion-license) under which this repository was originally uploaded, specifically **Section II** which grants:
+> ...a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.
+Note that these files did not come from HuggingFace, but instead from [modelscope](https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-v1-5). As such, some files that were present in the original repository may not be present. File integrity has been verified via checksum.
 # Stable Diffusion v1-5 Model Card
 Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
 The **Stable-Diffusion-v1-5** checkpoint was initialized with the weights of the [Stable-Diffusion-v1-2](https:/steps/huggingface.co/CompVis/stable-diffusion-v1-2)
 checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
+You can use this with the [🧨Diffusers library](https://github.com/huggingface/diffusers).
 ### Diffusers usage
 ```py
 from diffusers import StableDiffusionPipeline
 import torch
+pipe = StableDiffusionPipeline.from_pretrained(
+    "benjamin-paine/stable-diffusion-v1-5",
+    torch_dtype=torch.float16
+)
 pipe = pipe.to("cuda")
 prompt = "a photo of an astronaut riding a horse on mars"
 image.save("astronaut_rides_horse.png")
 ```
+For more detailed instructions, use-cases and examples in JAX follow the instructions [here](https://github.com/huggingface/diffusers#text-to-image-generation-with-stable-diffusion)
 ## Model Details
 - **Developed by:** Robin Rombach, Patrick Esser
 Specifically, the checker compares the class probability of harmful concepts in the embedding space of the `CLIPTextModel` *after generation* of the images.
 The concepts are passed into the model with the generated image and compared to a hand-engineered weight for each NSFW concept.
 ## Training
 **Training Data**
 filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
 - [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2` - 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
 - [`stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) Resumed from `stable-diffusion-v1-2` - 225,000 steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
+- [`stable-diffusion-v1-5`](https://huggingface.co/benjamin-paine/stable-diffusion-v1-5) Resumed from `stable-diffusion-v1-2` - 595,000 steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
+- [`stable-diffusion-inpainting`](https://huggingface.co/benjamin-paine/stable-diffusion-inpainting) Resumed from `stable-diffusion-v1-5` - then 440,000 steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.
 - **Hardware:** 32 x 8 x A100 GPUs
 - **Optimizer:** AdamW

safety_checker/model.fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3ab18aa51a16086ed7588220e4e0c113069f66937440010f29d470b5fd476be9
+size 608016874

safety_checker/pytorch_model.fp16.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:94e5774e81a871d658982a10e36cb33df9d5bf8a437cc194cc3a754a0fb2cc81
+size 608102483

text_encoder/model.fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f218f1ad781c14b425029ad2fddfe30fdb21a602ebf50a137892c38c1fdac3b
+size 246144378

text_encoder/pytorch_model.fp16.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e4eef3290f9ac8c9bb9a9f9b0a570f8234f68b9c41a48f5f5d9ba493b2cdb3e
+size 246187019

unet/diffusion_pytorch_model.fp16.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc59c7ed3d78a30c0b462fabe0ea9f34b5995609ced7748ad58ea5e96d852779
+size 1719328368

unet/diffusion_pytorch_model.fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:628d8d51f9278641c7d7e0cb4b7987f2a5ea84dbd99ac204c7c95fa7a4bcd999
+size 1719125272

v1-5-pruned-emaonly.fp16.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f30f876d4741a0a53d08a0b58d4488c80fd61268115899eac92e9ed8e9afbdd
+size 4265400990

v1-5-pruned-emaonly.fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:92954befdb6aacf52f86095eba54ac9262459bc21f987c0e51350f3679c4e45a
+size 2132650708

v1-5-pruned.fp16.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a5a69c37200d61672c8ba3161a1fa675a42510bfdfd97424379de737f6244dab
+size 7703824434

v1-5-pruned.fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:009eed2ef17d951bea4f7af1c207eb11b1572c9a6fc093a89c63ef2cab94ef7d
+size 3851786764

vae/diffusion_pytorch_model.fp16.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:448aecf43f5c42d73da487cddf36b659b2ba5988179a557818e6641c87f5eb4f
+size 167405870

vae/diffusion_pytorch_model.fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4a26b615f1542b39cb0b0ce7847d0885bd682abcbadc1f0f8a1c82220f533aea
+size 167335318