Spaces:

svjack
/

Wuerstchen

Running

App Files Files Community

svjack commited on Nov 21, 2023

Commit

7730485

•

1 Parent(s): 65a22d6

Delete wuerstchen

Browse files

Files changed (18) hide show

wuerstchen/.gitattributes +0 -35
wuerstchen/README.md +0 -90
wuerstchen/decoder/config.json +0 -43
wuerstchen/decoder/diffusion_pytorch_model.bin +0 -3
wuerstchen/decoder/diffusion_pytorch_model.safetensors +0 -3
wuerstchen/model_index.json +0 -25
wuerstchen/scheduler/scheduler_config.json +0 -6
wuerstchen/text_encoder/config.json +0 -25
wuerstchen/text_encoder/model.safetensors +0 -3
wuerstchen/text_encoder/pytorch_model.bin +0 -3
wuerstchen/tokenizer/merges.txt +0 -0
wuerstchen/tokenizer/special_tokens_map.json +0 -24
wuerstchen/tokenizer/tokenizer.json +0 -0
wuerstchen/tokenizer/tokenizer_config.json +0 -33
wuerstchen/tokenizer/vocab.json +0 -0
wuerstchen/vqgan/config.json +0 -13
wuerstchen/vqgan/diffusion_pytorch_model.bin +0 -3
wuerstchen/vqgan/diffusion_pytorch_model.safetensors +0 -3

wuerstchen/.gitattributes DELETED Viewed

@@ -1,35 +0,0 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

wuerstchen/README.md DELETED Viewed

@@ -1,90 +0,0 @@
----
-license: mit
-prior:
-- warp-diffusion/wuerstchen-prior
-tags:
-- text-to-image
-- wuerstchen
----
-<img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/i-DYpDHw8Pwiy7QBKZVR5.jpeg" width=1500>
-## Würstchen - Overview
-Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce
-computational costs for both training and inference by magnitudes. Training on 1024x1024 images, is way more expensive than training at 32x32. Usually, other works make
-use of a relatively small compression, in the range of 4x - 8x spatial compression. Würstchen takes this to an extreme. Through its novel design, we achieve a 42x spatial
-compression. This was unseen before because common methods fail to faithfully reconstruct detailed images after 16x spatial compression. Würstchen employs a
-two-stage compression, what we call Stage A and Stage B. Stage A is a VQGAN, and Stage B is a Diffusion Autoencoder (more details can be found in the [paper](https://arxiv.org/abs/2306.00637)).
-A third model, Stage C, is learned in that highly compressed latent space. This training requires fractions of the compute used for current top-performing models, allowing
-also cheaper and faster inference.
-## Würstchen - Decoder
-The Decoder is what we refer to as "Stage A" and "Stage B". The decoder takes in image embeddings, either generated by the Prior (Stage C) or extracted from a real image, and decodes those latents back into the pixel space. Specifically, Stage B first decodes the image embeddings into the VQGAN Space, and Stage A (which is a VQGAN)
-decodes the latents into pixel space. Together, they achieve a spatial compression of 42.
-**Note:** The reconstruction is lossy and loses information of the image. The current Stage B often lacks details in the reconstructions, which are especially noticeable to
-us humans when looking at faces, hands, etc. We are working on making these reconstructions even better in the future!
-### Image Sizes
-Würstchen was trained on image resolutions between 1024x1024 & 1536x1536. We sometimes also observe good outputs at resolutions like 1024x2048. Feel free to try it out.
-We also observed that the Prior (Stage C) adapts extremely fast to new resolutions. So finetuning it at 2048x2048 should be computationally cheap.
-<img src="https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/5pA5KUfGmvsObqiIjdGY1.jpeg" width=1000>
-## How to run
-This pipeline should be run together with a prior https://huggingface.co/warp-ai/wuerstchen-prior:
-```py
-import torch
-from diffusers import AutoPipelineForText2Image
-device = "cuda"
-dtype = torch.float16
-pipeline =  AutoPipelineForText2Image.from_pretrained(
-    "warp-diffusion/wuerstchen", torch_dtype=dtype
-).to(device)
-caption = "Anthropomorphic cat dressed as a fire fighter"
-output = pipeline(
-    prompt=caption,
-    height=1024,
-    width=1024,
-    prior_guidance_scale=4.0,
-    decoder_guidance_scale=0.0,
-).images
-```
-### Image Sampling Times
-The figure shows the inference times (on an A100) for different batch sizes (`num_images_per_prompt`) on Würstchen compared to [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) (without refiner).
-The left figure shows inference times (using torch > 2.0), whereas the right figure applies `torch.compile` to both pipelines in advance.
-![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/634cb5eefb80cc6bcaf63c3e/UPhsIH2f079ZuTA_sLdVe.jpeg)
-## Model Details
-- **Developed by:** Pablo Pernias, Dominic Rampas
-- **Model type:** Diffusion-based text-to-image generation model
-- **Language(s):** English
-- **License:** MIT
-- **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Diffusion model in the style of Stage C from the [Würstchen paper](https://arxiv.org/abs/2306.00637) that uses a fixed, pretrained text encoder ([CLIP ViT-bigG/14](https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k)).
-- **Resources for more information:** [GitHub Repository](https://github.com/dome272/Wuerstchen), [Paper](https://arxiv.org/abs/2306.00637).
-- **Cite as:**
-      @misc{pernias2023wuerstchen,
-            title={Wuerstchen: Efficient Pretraining of Text-to-Image Models},
-            author={Pablo Pernias and Dominic Rampas and Mats L. Richter and Christopher Pal and Marc Aubreville},
-            year={2023},
-            eprint={2306.00637},
-            archivePrefix={arXiv},
-            primaryClass={cs.CV}
-      }
-## Environmental Impact
-**Würstchen v2** **Estimated Emissions**
-Based on that information, we estimate the following CO2 emissions using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.
-- **Hardware Type:** A100 PCIe 40GB
-- **Hours used:** 24602
-- **Cloud Provider:** AWS
-- **Compute Region:** US-east
-- **Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid):** 2275.68 kg CO2 eq.

wuerstchen/decoder/config.json DELETED Viewed

@@ -1,43 +0,0 @@
-{
-  "_class_name": "WuerstchenDiffNeXt",
-  "_diffusers_version": "0.21.0.dev0",
-  "blocks": [
-    4,
-    4,
-    14,
-    4
-  ],
-  "c_cond": 1024,
-  "c_hidden": [
-    320,
-    640,
-    1280,
-    1280
-  ],
-  "c_in": 4,
-  "c_out": 4,
-  "c_r": 64,
-  "clip_embd": 1024,
-  "dropout": 0.1,
-  "effnet_embd": 16,
-  "inject_effnet": [
-    false,
-    true,
-    true,
-    true
-  ],
-  "kernel_size": 3,
-  "level_config": [
-    "CT",
-    "CTA",
-    "CTA",
-    "CTA"
-  ],
-  "nhead": [
-    -1,
-    10,
-    20,
-    20
-  ],
-  "patch_size": 2
-}

wuerstchen/decoder/diffusion_pytorch_model.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:b2e99829fe0a2c946ec6b4ef6979aee78bfaa05f87b0cf7b80ecafa20272ef60
-size 4221843094

wuerstchen/decoder/diffusion_pytorch_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:1510c2cc1a891df02d61d79866c40c506e9099519829e0282c2a79d7e9c7e66f
-size 4221568336

wuerstchen/model_index.json DELETED Viewed

@@ -1,25 +0,0 @@
-{
-  "_class_name": "WuerstchenDecoderPipeline",
-  "_diffusers_version": "0.21.0.dev0",
-  "decoder": [
-    "wuerstchen",
-    "WuerstchenDiffNeXt"
-  ],
-  "latent_dim_scale": 10.67,
-  "scheduler": [
-    "diffusers",
-    "DDPMWuerstchenScheduler"
-  ],
-  "text_encoder": [
-    "transformers",
-    "CLIPTextModel"
-  ],
-  "tokenizer": [
-    "transformers",
-    "CLIPTokenizerFast"
-  ],
-  "vqgan": [
-    "wuerstchen",
-    "PaellaVQModel"
-  ]
-}

wuerstchen/scheduler/scheduler_config.json DELETED Viewed

@@ -1,6 +0,0 @@
-{
-  "_class_name": "DDPMWuerstchenScheduler",
-  "_diffusers_version": "0.21.0.dev0",
-  "s": 0.008,
-  "scaler": 1.0
-}

wuerstchen/text_encoder/config.json DELETED Viewed

@@ -1,25 +0,0 @@
-{
-  "_name_or_path": "laion/CLIP-ViT-H-14-laion2B-s32B-b79K",
-  "architectures": [
-    "CLIPTextModel"
-  ],
-  "attention_dropout": 0.0,
-  "bos_token_id": 0,
-  "dropout": 0.0,
-  "eos_token_id": 2,
-  "hidden_act": "gelu",
-  "hidden_size": 1024,
-  "initializer_factor": 1.0,
-  "initializer_range": 0.02,
-  "intermediate_size": 4096,
-  "layer_norm_eps": 1e-05,
-  "max_position_embeddings": 77,
-  "model_type": "clip_text_model",
-  "num_attention_heads": 16,
-  "num_hidden_layers": 24,
-  "pad_token_id": 1,
-  "projection_dim": 1024,
-  "torch_dtype": "float32",
-  "transformers_version": "4.33.0.dev0",
-  "vocab_size": 49408
-}

wuerstchen/text_encoder/model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:bd94a7ea6922e8028227567fe14e04d2989eec31c482e0813e9006afea6637f1
-size 1411983168

wuerstchen/text_encoder/pytorch_model.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:0483b11b48b0f5a5079f778c0df4057d7b797cf58ef176087ec03a236d3e16e0
-size 1412064410

wuerstchen/tokenizer/merges.txt DELETED Viewed

The diff for this file is too large to render. See raw diff

wuerstchen/tokenizer/special_tokens_map.json DELETED Viewed

@@ -1,24 +0,0 @@
-{
-  "bos_token": {
-    "content": "<|startoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "eos_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": "<|endoftext|>",
-  "unk_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  }
-}

wuerstchen/tokenizer/tokenizer.json DELETED Viewed

The diff for this file is too large to render. See raw diff

wuerstchen/tokenizer/tokenizer_config.json DELETED Viewed

@@ -1,33 +0,0 @@
-{
-  "add_prefix_space": false,
-  "bos_token": {
-    "__type": "AddedToken",
-    "content": "<|startoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "clean_up_tokenization_spaces": true,
-  "do_lower_case": true,
-  "eos_token": {
-    "__type": "AddedToken",
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "errors": "replace",
-  "model_max_length": 77,
-  "pad_token": "<|endoftext|>",
-  "tokenizer_class": "CLIPTokenizer",
-  "unk_token": {
-    "__type": "AddedToken",
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  }
-}

wuerstchen/tokenizer/vocab.json DELETED Viewed

The diff for this file is too large to render. See raw diff

wuerstchen/vqgan/config.json DELETED Viewed

@@ -1,13 +0,0 @@
-{
-  "_class_name": "PaellaVQModel",
-  "_diffusers_version": "0.21.0.dev0",
-  "bottleneck_blocks": 12,
-  "embed_dim": 384,
-  "in_channels": 3,
-  "latent_channels": 4,
-  "levels": 2,
-  "num_vq_embeddings": 8192,
-  "out_channels": 3,
-  "scale_factor": 0.3764,
-  "up_down_scale_factor": 2
-}

wuerstchen/vqgan/diffusion_pytorch_model.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:f3ab7752b474058d177e8565860367a438b8016ba788954394fbb7f1da16d6e1
-size 73674142

wuerstchen/vqgan/diffusion_pytorch_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:052db8852c0d8b117e6d2a59ae3e0c7d7aaae3d00f247e392ef8e9837e11d6c4
-size 73639568