apolinario commited on
Commit
3950160
1 Parent(s): 286a37a

Remove garbage and do the whole description

Browse files
0progress.png DELETED
Binary file (62.2 kB)
 
25progress.png DELETED
Binary file (121 kB)
 
9x4Wfge4acZih9cej6czff.png DELETED
Binary file (123 kB)
 
ESKN3oxvDsrvfegKWBC3RU.png DELETED
Binary file (131 kB)
 
app.py CHANGED
@@ -2350,8 +2350,6 @@ iface = gr.Interface(
2350
  ],
2351
  outputs=image,
2352
  title="Generate images from text with VQGAN+CLIP (Hypertron v2)",
2353
- #description="<div>By typing a prompt and pressing submit you can generate images based on this prompt. <a href='https://github.com/CompVis/latent-diffusion' target='_blank'>Latent Diffusion</a> is a text-to-image model created by <a href='https://github.com/CompVis' target='_blank'>CompVis</a>, trained on the <a href='https://laion.ai/laion-400-open-dataset/'>LAION-400M dataset.</a><br>This UI to the model was assembled by <a style='color: rgb(245, 158, 11);font-weight:bold' href='https://twitter.com/multimodalart' target='_blank'>@multimodalart</a></div>",
2354
- #article="<h4 style='font-size: 110%;margin-top:.5em'>Biases acknowledgment</h4><div>Despite how impressive being able to turn text into image is, beware to the fact that this model may output content that reinforces or exarcbates societal biases. According to the <a href='https://arxiv.org/abs/2112.10752' target='_blank'>Latent Diffusion paper</a>:<i> \"Deep learning modules tend to reproduce or exacerbate biases that are already present in the data\"</i>. The model was trained on an unfiltered version the LAION-400M dataset, which scrapped non-curated image-text-pairs from the internet (the exception being the the removal of illegal content) and is meant to be used for research purposes, such as this one. <a href='https://laion.ai/laion-400-open-dataset/' target='_blank'>You can read more on LAION's website</a></div><h4 style='font-size: 110%;margin-top:1em'>Who owns the images produced by this demo?</h4><div>Definetly not me! Probably you do. I say probably because the Copyright discussion about AI generated art is ongoing. So <a href='https://www.theverge.com/2022/2/21/22944335/us-copyright-office-reject-ai-generated-art-recent-entrance-to-paradise' target='_blank'>it may be the case that everything produced here falls automatically into the public domain</a>. But in any case it is either yours or is in the public domain.</div>"
2355
  description="<div>By typing a prompt and pressing submit you can generate images based on this prompt. <a href='https://arxiv.org/abs/2204.08583' target='_blank'>VQGAN+CLIP</a> is a combination of a GAN and CLIP, as explained here. This approach innagurated the open source AI art scene, and the Hypertron v2 implementation compiles many improvements.</a><br>This Spaces UI to the model was assembled by <a style='color: rgb(99, 102, 241);font-weight:bold' href='https://twitter.com/multimodalart' target='_blank'>@multimodalart</a>, keep up with the <a style='color: rgb(99, 102, 241);' href='https://multimodal.art/news' target='_blank'>latest multimodal ai art news here</a> and consider <a style='color: rgb(99, 102, 241);' href='https://www.patreon.com/multimodalart' target='_blank'>supporting us on Patreon</a>",
2356
  article="<h4 style='font-size: 110%;margin-top:.5em'>Biases acknowledgment</h4><div>Despite how impressive being able to turn text into image is, beware to the fact that this model may output content that reinforces or exarcbates societal biases. According to the <a href='https://arxiv.org/abs/2112.10752' target='_blank'>Latent Diffusion paper</a>:<i> \"Deep learning modules tend to reproduce or exacerbate biases that are already present in the data\"</i>. The model was trained on both the Imagenet dataset and in an undisclosed dataset by OpenAI.</div><h4 style='font-size: 110%;margin-top:1em'>Who owns the images produced by this demo?</h4><div>Definetly not me! Probably you do. I say probably because the Copyright discussion about AI generated art is ongoing. So <a href='https://www.theverge.com/2022/2/21/22944335/us-copyright-office-reject-ai-generated-art-recent-entrance-to-paradise' target='_blank'>it may be the case that everything produced here falls automatically into the public domain</a>. But in any case it is either yours or is in the public domain.</div>"
2357
  )
 
2350
  ],
2351
  outputs=image,
2352
  title="Generate images from text with VQGAN+CLIP (Hypertron v2)",
 
 
2353
  description="<div>By typing a prompt and pressing submit you can generate images based on this prompt. <a href='https://arxiv.org/abs/2204.08583' target='_blank'>VQGAN+CLIP</a> is a combination of a GAN and CLIP, as explained here. This approach innagurated the open source AI art scene, and the Hypertron v2 implementation compiles many improvements.</a><br>This Spaces UI to the model was assembled by <a style='color: rgb(99, 102, 241);font-weight:bold' href='https://twitter.com/multimodalart' target='_blank'>@multimodalart</a>, keep up with the <a style='color: rgb(99, 102, 241);' href='https://multimodal.art/news' target='_blank'>latest multimodal ai art news here</a> and consider <a style='color: rgb(99, 102, 241);' href='https://www.patreon.com/multimodalart' target='_blank'>supporting us on Patreon</a>",
2354
  article="<h4 style='font-size: 110%;margin-top:.5em'>Biases acknowledgment</h4><div>Despite how impressive being able to turn text into image is, beware to the fact that this model may output content that reinforces or exarcbates societal biases. According to the <a href='https://arxiv.org/abs/2112.10752' target='_blank'>Latent Diffusion paper</a>:<i> \"Deep learning modules tend to reproduce or exacerbate biases that are already present in the data\"</i>. The model was trained on both the Imagenet dataset and in an undisclosed dataset by OpenAI.</div><h4 style='font-size: 110%;margin-top:1em'>Who owns the images produced by this demo?</h4><div>Definetly not me! Probably you do. I say probably because the Copyright discussion about AI generated art is ongoing. So <a href='https://www.theverge.com/2022/2/21/22944335/us-copyright-office-reject-ai-generated-art-recent-entrance-to-paradise' target='_blank'>it may be the case that everything produced here falls automatically into the public domain</a>. But in any case it is either yours or is in the public domain.</div>"
2355
  )
igZbgSaM3wBbUx6fgM3sxr.png DELETED
Binary file (79.3 kB)
 
o45KGFnaYGMYPoFTJjTp7w.png DELETED
Binary file (115 kB)
 
progress.png DELETED
Binary file (131 kB)
 
second_attempt.py DELETED
@@ -1,2017 +0,0 @@
1
- # Hypertron v2
2
- # Original file is located at https://colab.research.google.com/drive/1N4UNSbtNMd31N_gAT9rAm8ZzPh62Y5ud
3
- import sys
4
-
5
- sys.stdout.write("Imports ...\n")
6
- sys.stdout.flush()
7
-
8
- sys.path.append('./CLIP')
9
- sys.path.append('./taming-transformers')
10
-
11
- import os
12
- os.environ["XDG_CACHE_HOME"] = "../../.cache"
13
- from huggingface_hub import hf_hub_download
14
- import gradio as gr
15
- from CLIP import clip
16
- from omegaconf import OmegaConf
17
- from taming.models import cond_transformer, vqgan
18
- import torch
19
- vqgan_model = hf_hub_download(repo_id="boris/vqgan_f16_16384", filename="model.ckpt")
20
- vqgan_config = hf_hub_download(repo_id="boris/vqgan_f16_16384", filename="config.yaml")
21
-
22
-
23
- device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
24
- print("Using device:", device)
25
-
26
-
27
- perceptor = (
28
- clip.load("ViT-B/32", jit=False)[0]
29
- .eval()
30
- .requires_grad_(False)
31
- .to(device)
32
- )
33
- def run_all(user_input,num_steps, template, width,height):
34
- global model
35
- global perceptor
36
- import argparse
37
- import math
38
- from pathlib import Path
39
- import sys
40
- import pandas as pd
41
- from IPython import display
42
- from base64 import b64encode
43
-
44
- from PIL import Image
45
-
46
- import torch
47
- from torch import nn
48
- import torch.optim as optim
49
- from torch import optim
50
- from torch.nn import functional as F
51
- from torchvision import transforms
52
- from torchvision.transforms import functional as TF
53
- import torchvision.transforms as T
54
- from tqdm.notebook import tqdm
55
-
56
- import kornia.augmentation as K
57
- import numpy as np
58
- import subprocess
59
- import imageio
60
- from PIL import ImageFile, Image
61
- #ImageFile.LOAD_TRUNCATED_IMAGES = True
62
- import hashlib
63
- from PIL.PngImagePlugin import PngImageFile, PngInfo
64
- import json
65
- import IPython
66
- from IPython.display import Markdown, display, Image, clear_output
67
- import urllib.request
68
- import random
69
-
70
- sys.stdout.write("Parsing arguments ...\n")
71
- sys.stdout.flush()
72
-
73
- def parse_args():
74
- desc = "Blah"
75
- parser = argparse.ArgumentParser(description=desc)
76
- parser.add_argument('--prompt', type=str, help='Text to generate image from.')
77
- parser.add_argument('--seed', type=int, help='Random seed.')
78
- parser.add_argument('--sizex', type=int, help='Image width.')
79
- parser.add_argument('--sizey', type=int, help='Image height.')
80
- parser.add_argument('--flavor', type=str, help='Flavor.')
81
- parser.add_argument('--template', type=str, help='Template.')
82
- parser.add_argument('--iterations', type=int, help='Iterations')
83
- parser.add_argument('--mse', type=int, help='Use MSE')
84
- parser.add_argument('--update', type=int, help='Update every n iterations.')
85
- parser.add_argument('--clip_model_1', type=str, help='CLIP model 1 to load.')
86
- parser.add_argument('--clip_model_2', type=str, help='CLIP model 2 to load.')
87
- parser.add_argument('--clip_model_3', type=str, help='CLIP model 3 to load.')
88
- parser.add_argument('--clip_model_4', type=str, help='CLIP model 4 to load.')
89
- parser.add_argument('--clip_model_5', type=str, help='CLIP model 5 to load.')
90
- parser.add_argument('--clip_model_6', type=str, help='CLIP model 6 to load.')
91
- parser.add_argument('--vqgan_model', type=str, help='VQGAN model to load.')
92
- parser.add_argument('--seed_image', type=str, help='Initial seed image.', default=None)
93
- parser.add_argument('--image_file', type=str, help='Output image name.')
94
- parser.add_argument('--frame_dir', type=str, help='Save frame file directory.')
95
- args = parser.parse_args()
96
- return args
97
-
98
- image_path = None
99
- flavor = 'cumin'
100
- #args2=parse_args();
101
- args2 = argparse.Namespace(
102
- prompt=user_input,
103
- seed=int(random.randint(0, 2147483647)),
104
- sizex=width,
105
- sizey=height,
106
- flavor=flavor,
107
- iterations=num_steps,
108
- mse=True,
109
- update=100,
110
- template=template,
111
- vqgan_model='ImageNet 16384',
112
- seed_image=image_path,
113
- image_file="progress.png",
114
- #frame_dir=intermediary_folder,
115
- )
116
- if args2.seed is not None:
117
- sys.stdout.write(f'Setting seed to {args2.seed} ...\n')
118
- sys.stdout.flush()
119
- import numpy as np
120
- np.random.seed(args2.seed)
121
- import random
122
- random.seed(args2.seed)
123
- #next line forces deterministic random values, but causes other issues with resampling (uncomment to see)
124
- #torch.use_deterministic_algorithms(True)
125
- torch.manual_seed(args2.seed)
126
- torch.cuda.manual_seed(args2.seed)
127
- torch.cuda.manual_seed_all(args2.seed)
128
- torch.backends.cudnn.deterministic = True
129
- torch.backends.cudnn.benchmark = False
130
-
131
-
132
-
133
-
134
-
135
- """
136
- from imgtag import ImgTag # metadata
137
- from libxmp import * # metadata
138
- import libxmp # metadata
139
- from stegano import lsb
140
- import gc
141
- import GPUtil as GPU
142
- """
143
-
144
- device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
145
- print('Using device:', device)
146
-
147
-
148
- def noise_gen(shape, octaves=5):
149
- n, c, h, w = shape
150
- noise = torch.zeros([n, c, 1, 1])
151
- max_octaves = min(octaves, math.log(h)/math.log(2), math.log(w)/math.log(2))
152
- for i in reversed(range(max_octaves)):
153
- h_cur, w_cur = h // 2**i, w // 2**i
154
- noise = F.interpolate(noise, (h_cur, w_cur), mode='bicubic', align_corners=False)
155
- noise += torch.randn([n, c, h_cur, w_cur]) / 5
156
- return noise
157
-
158
- def sinc(x):
159
- return torch.where(x != 0, torch.sin(math.pi * x) / (math.pi * x), x.new_ones([]))
160
-
161
-
162
- def lanczos(x, a):
163
- cond = torch.logical_and(-a < x, x < a)
164
- out = torch.where(cond, sinc(x) * sinc(x/a), x.new_zeros([]))
165
- return out / out.sum()
166
-
167
-
168
- def ramp(ratio, width):
169
- n = math.ceil(width / ratio + 1)
170
- out = torch.empty([n])
171
- cur = 0
172
- for i in range(out.shape[0]):
173
- out[i] = cur
174
- cur += ratio
175
- return torch.cat([-out[1:].flip([0]), out])[1:-1]
176
-
177
-
178
- def resample(input, size, align_corners=True):
179
- n, c, h, w = input.shape
180
- dh, dw = size
181
-
182
- input = input.view([n * c, 1, h, w])
183
-
184
- if dh < h:
185
- kernel_h = lanczos(ramp(dh / h, 2), 2).to(input.device, input.dtype)
186
- pad_h = (kernel_h.shape[0] - 1) // 2
187
- input = F.pad(input, (0, 0, pad_h, pad_h), 'reflect')
188
- input = F.conv2d(input, kernel_h[None, None, :, None])
189
-
190
- if dw < w:
191
- kernel_w = lanczos(ramp(dw / w, 2), 2).to(input.device, input.dtype)
192
- pad_w = (kernel_w.shape[0] - 1) // 2
193
- input = F.pad(input, (pad_w, pad_w, 0, 0), 'reflect')
194
- input = F.conv2d(input, kernel_w[None, None, None, :])
195
-
196
- input = input.view([n, c, h, w])
197
- return F.interpolate(input, size, mode='bicubic', align_corners=align_corners)
198
-
199
- def lerp(a, b, f):
200
- return (a * (1.0 - f)) + (b * f);
201
-
202
- class ReplaceGrad(torch.autograd.Function):
203
- @staticmethod
204
- def forward(ctx, x_forward, x_backward):
205
- ctx.shape = x_backward.shape
206
- return x_forward
207
-
208
- @staticmethod
209
- def backward(ctx, grad_in):
210
- return None, grad_in.sum_to_size(ctx.shape)
211
-
212
-
213
- replace_grad = ReplaceGrad.apply
214
-
215
-
216
- class ClampWithGrad(torch.autograd.Function):
217
- @staticmethod
218
- def forward(ctx, input, min, max):
219
- ctx.min = min
220
- ctx.max = max
221
- ctx.save_for_backward(input)
222
- return input.clamp(min, max)
223
-
224
- @staticmethod
225
- def backward(ctx, grad_in):
226
- input, = ctx.saved_tensors
227
- return grad_in * (grad_in * (input - input.clamp(ctx.min, ctx.max)) >= 0), None, None
228
-
229
-
230
- clamp_with_grad = ClampWithGrad.apply
231
-
232
-
233
- def vector_quantize(x, codebook):
234
- d = x.pow(2).sum(dim=-1, keepdim=True) + codebook.pow(2).sum(dim=1) - 2 * x @ codebook.T
235
- indices = d.argmin(-1)
236
- x_q = F.one_hot(indices, codebook.shape[0]).to(d.dtype) @ codebook
237
- return replace_grad(x_q, x)
238
-
239
-
240
- class Prompt(nn.Module):
241
- def __init__(self, embed, weight=1., stop=float('-inf')):
242
- super().__init__()
243
- self.register_buffer('embed', embed)
244
- self.register_buffer('weight', torch.as_tensor(weight))
245
- self.register_buffer('stop', torch.as_tensor(stop))
246
-
247
- def forward(self, input):
248
- input_normed = F.normalize(input.unsqueeze(1), dim=2)
249
- embed_normed = F.normalize(self.embed.unsqueeze(0), dim=2)
250
- dists = input_normed.sub(embed_normed).norm(dim=2).div(2).arcsin().pow(2).mul(2)
251
- dists = dists * self.weight.sign()
252
- return self.weight.abs() * replace_grad(dists, torch.maximum(dists, self.stop)).mean()
253
-
254
-
255
- #def parse_prompt(prompt):
256
- # vals = prompt.rsplit(':', 2)
257
- # vals = vals + ['', '1', '-inf'][len(vals):]
258
- # return vals[0], float(vals[1]), float(vals[2])
259
-
260
- def parse_prompt(prompt):
261
- if prompt.startswith('http://') or prompt.startswith('https://'):
262
- vals = prompt.rsplit(':', 1)
263
- vals = [vals[0] + ':' + vals[1], *vals[2:]]
264
- else:
265
- vals = prompt.rsplit(':', 1)
266
- vals = vals + ['', '1', '-inf'][len(vals):]
267
- return vals[0], float(vals[1]), float(vals[2])
268
-
269
- def one_sided_clip_loss(input, target, labels=None, logit_scale=100):
270
- input_normed = F.normalize(input, dim=-1)
271
- target_normed = F.normalize(target, dim=-1)
272
- logits = input_normed @ target_normed.T * logit_scale
273
- if labels is None:
274
- labels = torch.arange(len(input), device=logits.device)
275
- return F.cross_entropy(logits, labels)
276
-
277
- class EMATensor(nn.Module):
278
- """implmeneted by Katherine Crowson"""
279
- def __init__(self, tensor, decay):
280
- super().__init__()
281
- self.tensor = nn.Parameter(tensor)
282
- self.register_buffer('biased', torch.zeros_like(tensor))
283
- self.register_buffer('average', torch.zeros_like(tensor))
284
- self.decay = decay
285
- self.register_buffer('accum', torch.tensor(1.))
286
- self.update()
287
-
288
- @torch.no_grad()
289
- def update(self):
290
- if not self.training:
291
- raise RuntimeError('update() should only be called during training')
292
-
293
- self.accum *= self.decay
294
- self.biased.mul_(self.decay)
295
- self.biased.add_((1 - self.decay) * self.tensor)
296
- self.average.copy_(self.biased)
297
- self.average.div_(1 - self.accum)
298
-
299
- def forward(self):
300
- if self.training:
301
- return self.tensor
302
- return self.average
303
-
304
-
305
- ############################################################################################
306
- ############################################################################################
307
-
308
-
309
- class MakeCutoutsJuu(nn.Module):
310
- def __init__(self, cut_size, cutn, cut_pow, augs):
311
- super().__init__()
312
- self.cut_size = cut_size
313
- self.cutn = cutn
314
- self.cut_pow = cut_pow
315
- self.augs = nn.Sequential(
316
- #K.RandomGaussianNoise(mean=0.0, std=0.5, p=0.1),
317
- K.RandomHorizontalFlip(p=0.5),
318
- K.RandomSharpness(0.3,p=0.4),
319
- K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'),
320
- K.RandomPerspective(0.2,p=0.4),
321
- K.ColorJitter(hue=0.01, saturation=0.01, p=0.7),
322
- K.RandomGrayscale(p=0.1),
323
- )
324
- self.noise_fac = 0.1
325
-
326
- def forward(self, input):
327
- sideY, sideX = input.shape[2:4]
328
- max_size = min(sideX, sideY)
329
- min_size = min(sideX, sideY, self.cut_size)
330
- cutouts = []
331
- for _ in range(self.cutn):
332
- size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)
333
- offsetx = torch.randint(0, sideX - size + 1, ())
334
- offsety = torch.randint(0, sideY - size + 1, ())
335
- cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]
336
- cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
337
- batch = self.augs(torch.cat(cutouts, dim=0))
338
- if self.noise_fac:
339
- facs = batch.new_empty([self.cutn, 1, 1, 1]).uniform_(0, self.noise_fac)
340
- batch = batch + facs * torch.randn_like(batch)
341
- return batch
342
-
343
- class MakeCutoutsMoth(nn.Module):
344
- def __init__(self, cut_size, cutn, cut_pow, augs, skip_augs=False):
345
- super().__init__()
346
- self.cut_size = cut_size
347
- self.cutn = cutn
348
- self.cut_pow = cut_pow
349
- self.skip_augs = skip_augs
350
- self.augs = T.Compose([
351
- T.RandomHorizontalFlip(p=0.5),
352
- T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),
353
- T.RandomAffine(degrees=15, translate=(0.1, 0.1)),
354
- T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),
355
- T.RandomPerspective(distortion_scale=0.4, p=0.7),
356
- T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),
357
- T.RandomGrayscale(p=0.15),
358
- T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),
359
- # T.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
360
- ])
361
-
362
- def forward(self, input):
363
- input = T.Pad(input.shape[2]//4, fill=0)(input)
364
- sideY, sideX = input.shape[2:4]
365
- max_size = min(sideX, sideY)
366
-
367
- cutouts = []
368
- for ch in range(cutn):
369
- if ch > cutn - cutn//4:
370
- cutout = input.clone()
371
- else:
372
- size = int(max_size * torch.zeros(1,).normal_(mean=.8, std=.3).clip(float(self.cut_size/max_size), 1.))
373
- offsetx = torch.randint(0, abs(sideX - size + 1), ())
374
- offsety = torch.randint(0, abs(sideY - size + 1), ())
375
- cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]
376
-
377
- if not self.skip_augs:
378
- cutout = self.augs(cutout)
379
- cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
380
- del cutout
381
-
382
- cutouts = torch.cat(cutouts, dim=0)
383
- return cutouts
384
-
385
- class MakeCutoutsAaron(nn.Module):
386
- def __init__(self, cut_size, cutn, cut_pow, augs):
387
- super().__init__()
388
- self.cut_size = cut_size
389
- self.cutn = cutn
390
- self.cut_pow = cut_pow
391
- self.augs = augs
392
- self.av_pool = nn.AdaptiveAvgPool2d((self.cut_size, self.cut_size))
393
- self.max_pool = nn.AdaptiveMaxPool2d((self.cut_size, self.cut_size))
394
-
395
- def set_cut_pow(self, cut_pow):
396
- self.cut_pow = cut_pow
397
-
398
- def forward(self, input):
399
- sideY, sideX = input.shape[2:4]
400
- max_size = min(sideX, sideY)
401
- min_size = min(sideX, sideY, self.cut_size)
402
- cutouts = []
403
- cutouts_full = []
404
-
405
- min_size_width = min(sideX, sideY)
406
- lower_bound = float(self.cut_size/min_size_width)
407
-
408
- for ii in range(self.cutn):
409
- size = int(min_size_width*torch.zeros(1,).normal_(mean=.8, std=.3).clip(lower_bound, 1.)) # replace .5 with a result for 224 the default large size is .95
410
-
411
- offsetx = torch.randint(0, sideX - size + 1, ())
412
- offsety = torch.randint(0, sideY - size + 1, ())
413
- cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]
414
- cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
415
-
416
- cutouts = torch.cat(cutouts, dim=0)
417
-
418
- return clamp_with_grad(cutouts, 0, 1)
419
-
420
- class MakeCutoutsCumin(nn.Module):
421
- #from https://colab.research.google.com/drive/1ZAus_gn2RhTZWzOWUpPERNC0Q8OhZRTZ
422
- def __init__(self, cut_size, cutn, cut_pow, augs):
423
- super().__init__()
424
- self.cut_size = cut_size
425
- tqdm.write(f'cut size: {self.cut_size}')
426
- self.cutn = cutn
427
- self.cut_pow = cut_pow
428
- self.noise_fac = 0.1
429
- self.av_pool = nn.AdaptiveAvgPool2d((self.cut_size, self.cut_size))
430
- self.max_pool = nn.AdaptiveMaxPool2d((self.cut_size, self.cut_size))
431
- self.augs = nn.Sequential(
432
- #K.RandomHorizontalFlip(p=0.5),
433
- #K.RandomSharpness(0.3,p=0.4),
434
- #K.RandomGaussianBlur((3,3),(10.5,10.5),p=0.2),
435
- #K.RandomGaussianNoise(p=0.5),
436
- #K.RandomElasticTransform(kernel_size=(33, 33), sigma=(7,7), p=0.2),
437
- K.RandomAffine(degrees=15, translate=0.1, p=0.7, padding_mode='border'),
438
- K.RandomPerspective(0.7,p=0.7),
439
- K.ColorJitter(hue=0.1, saturation=0.1, p=0.7),
440
- K.RandomErasing((.1, .4), (.3, 1/.3), same_on_batch=True, p=0.7),)
441
-
442
- def set_cut_pow(self, cut_pow):
443
- self.cut_pow = cut_pow
444
-
445
- def forward(self, input):
446
- sideY, sideX = input.shape[2:4]
447
- max_size = min(sideX, sideY)
448
- min_size = min(sideX, sideY, self.cut_size)
449
- cutouts = []
450
- cutouts_full = []
451
- noise_fac = 0.1
452
-
453
-
454
- min_size_width = min(sideX, sideY)
455
- lower_bound = float(self.cut_size/min_size_width)
456
-
457
- for ii in range(self.cutn):
458
-
459
-
460
- # size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)
461
- randsize = torch.zeros(1,).normal_(mean=.8, std=.3).clip(lower_bound,1.)
462
- size_mult = randsize ** self.cut_pow
463
- size = int(min_size_width * (size_mult.clip(lower_bound, 1.))) # replace .5 with a result for 224 the default large size is .95
464
- # size = int(min_size_width*torch.zeros(1,).normal_(mean=.9, std=.3).clip(lower_bound, .95)) # replace .5 with a result for 224 the default large size is .95
465
-
466
- offsetx = torch.randint(0, sideX - size + 1, ())
467
- offsety = torch.randint(0, sideY - size + 1, ())
468
- cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]
469
- cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
470
-
471
-
472
- cutouts = torch.cat(cutouts, dim=0)
473
- cutouts = clamp_with_grad(cutouts, 0, 1)
474
-
475
- #if args.use_augs:
476
- cutouts = self.augs(cutouts)
477
- if self.noise_fac:
478
- facs = cutouts.new_empty([cutouts.shape[0], 1, 1, 1]).uniform_(0, self.noise_fac)
479
- cutouts = cutouts + facs * torch.randn_like(cutouts)
480
- return cutouts
481
-
482
-
483
- class MakeCutoutsHolywater(nn.Module):
484
- def __init__(self, cut_size, cutn, cut_pow, augs):
485
- super().__init__()
486
- self.cut_size = cut_size
487
- tqdm.write(f'cut size: {self.cut_size}')
488
- self.cutn = cutn
489
- self.cut_pow = cut_pow
490
- self.noise_fac = 0.1
491
- self.av_pool = nn.AdaptiveAvgPool2d((self.cut_size, self.cut_size))
492
- self.max_pool = nn.AdaptiveMaxPool2d((self.cut_size, self.cut_size))
493
- self.augs = nn.Sequential(
494
- #K.RandomGaussianNoise(mean=0.0, std=0.5, p=0.1),
495
- K.RandomHorizontalFlip(p=0.5),
496
- K.RandomSharpness(0.3,p=0.4),
497
- K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'),
498
- K.RandomPerspective(0.2,p=0.4),
499
- K.ColorJitter(hue=0.01, saturation=0.01, p=0.7),
500
- K.RandomGrayscale(p=0.1),
501
- )
502
-
503
- def set_cut_pow(self, cut_pow):
504
- self.cut_pow = cut_pow
505
-
506
- def forward(self, input):
507
- sideY, sideX = input.shape[2:4]
508
- max_size = min(sideX, sideY)
509
- min_size = min(sideX, sideY, self.cut_size)
510
- cutouts = []
511
- cutouts_full = []
512
- noise_fac = 0.1
513
- min_size_width = min(sideX, sideY)
514
- lower_bound = float(self.cut_size/min_size_width)
515
-
516
- for ii in range(self.cutn):
517
- size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)
518
- randsize = torch.zeros(1,).normal_(mean=.8, std=.3).clip(lower_bound,1.)
519
- size_mult = randsize ** self.cut_pow * ii + size
520
- size1 = int((min_size_width) * (size_mult.clip(lower_bound, 1.))) # replace .5 with a result for 224 the default large size is .95
521
- size2 = int((min_size_width) * torch.zeros(1,).normal_(mean=.9, std=.3).clip(lower_bound, .95)) # replace .5 with a result for 224 the default large size is .95
522
- offsetx = torch.randint(0, sideX - size1 + 1, ())
523
- offsety = torch.randint(0, sideY - size2 + 1, ())
524
- cutout = input[:, :, offsety:offsety + size2 + ii, offsetx:offsetx + size1 + ii]
525
- cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
526
-
527
- cutouts = torch.cat(cutouts, dim=0)
528
- cutouts = clamp_with_grad(cutouts, 0, 1)
529
- cutouts = self.augs(cutouts)
530
- facs = cutouts.new_empty([cutouts.shape[0], 1, 1, 1]).uniform_(0, self.noise_fac)
531
- cutouts = cutouts + facs * torch.randn_like(cutouts)
532
- return cutouts
533
-
534
- class MakeCutoutsOldHolywater(nn.Module):
535
- def __init__(self, cut_size, cutn, cut_pow, augs):
536
- super().__init__()
537
- self.cut_size = cut_size
538
- tqdm.write(f'cut size: {self.cut_size}')
539
- self.cutn = cutn
540
- self.cut_pow = cut_pow
541
- self.noise_fac = 0.1
542
- self.av_pool = nn.AdaptiveAvgPool2d((self.cut_size, self.cut_size))
543
- self.max_pool = nn.AdaptiveMaxPool2d((self.cut_size, self.cut_size))
544
- self.augs = nn.Sequential(
545
- #K.RandomHorizontalFlip(p=0.5),
546
- #K.RandomSharpness(0.3,p=0.4),
547
- #K.RandomGaussianBlur((3,3),(10.5,10.5),p=0.2),
548
- #K.RandomGaussianNoise(p=0.5),
549
- #K.RandomElasticTransform(kernel_size=(33, 33), sigma=(7,7), p=0.2),
550
- K.RandomAffine(degrees=180, translate=0.5, p=0.2, padding_mode='border'),
551
- K.RandomPerspective(0.6,p=0.9),
552
- K.ColorJitter(hue=0.03, saturation=0.01, p=0.1),
553
- K.RandomErasing((.1, .7), (.3, 1/.4), same_on_batch=True, p=0.2),)
554
-
555
- def set_cut_pow(self, cut_pow):
556
- self.cut_pow = cut_pow
557
-
558
- def forward(self, input):
559
- sideY, sideX = input.shape[2:4]
560
- max_size = min(sideX, sideY)
561
- min_size = min(sideX, sideY, self.cut_size)
562
- cutouts = []
563
- cutouts_full = []
564
- noise_fac = 0.1
565
-
566
-
567
- min_size_width = min(sideX, sideY)
568
- lower_bound = float(self.cut_size/min_size_width)
569
-
570
- for ii in range(self.cutn):
571
-
572
-
573
- # size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)
574
- randsize = torch.zeros(1,).normal_(mean=.8, std=.3).clip(lower_bound,1.)
575
- size_mult = randsize ** self.cut_pow
576
- size = int(min_size_width * (size_mult.clip(lower_bound, 1.))) # replace .5 with a result for 224 the default large size is .95
577
- # size = int(min_size_width*torch.zeros(1,).normal_(mean=.9, std=.3).clip(lower_bound, .95)) # replace .5 with a result for 224 the default large size is .95
578
-
579
- offsetx = torch.randint(0, sideX - size + 1, ())
580
- offsety = torch.randint(0, sideY - size + 1, ())
581
- cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]
582
- cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
583
-
584
-
585
- cutouts = torch.cat(cutouts, dim=0)
586
- cutouts = clamp_with_grad(cutouts, 0, 1)
587
-
588
- #if args.use_augs:
589
- cutouts = self.augs(cutouts)
590
- if self.noise_fac:
591
- facs = cutouts.new_empty([cutouts.shape[0], 1, 1, 1]).uniform_(0, self.noise_fac)
592
- cutouts = cutouts + facs * torch.randn_like(cutouts)
593
- return cutouts
594
-
595
-
596
- class MakeCutoutsGinger(nn.Module):
597
- def __init__(self, cut_size, cutn, cut_pow, augs):
598
- super().__init__()
599
- self.cut_size = cut_size
600
- tqdm.write(f'cut size: {self.cut_size}')
601
- self.cutn = cutn
602
- self.cut_pow = cut_pow
603
- self.noise_fac = 0.1
604
- self.av_pool = nn.AdaptiveAvgPool2d((self.cut_size, self.cut_size))
605
- self.max_pool = nn.AdaptiveMaxPool2d((self.cut_size, self.cut_size))
606
- self.augs = augs
607
- '''
608
- nn.Sequential(
609
- K.RandomHorizontalFlip(p=0.5),
610
- K.RandomSharpness(0.3,p=0.4),
611
- K.RandomGaussianBlur((3,3),(10.5,10.5),p=0.2),
612
- K.RandomGaussianNoise(p=0.5),
613
- K.RandomElasticTransform(kernel_size=(33, 33), sigma=(7,7), p=0.2),
614
- K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'), # padding_mode=2
615
- K.RandomPerspective(0.2,p=0.4, ),
616
- K.ColorJitter(hue=0.01, saturation=0.01, p=0.7),)
617
- '''
618
-
619
- def set_cut_pow(self, cut_pow):
620
- self.cut_pow = cut_pow
621
-
622
- def forward(self, input):
623
- sideY, sideX = input.shape[2:4]
624
- max_size = min(sideX, sideY)
625
- min_size = min(sideX, sideY, self.cut_size)
626
- cutouts = []
627
- cutouts_full = []
628
- noise_fac = 0.1
629
-
630
-
631
- min_size_width = min(sideX, sideY)
632
- lower_bound = float(self.cut_size/min_size_width)
633
-
634
- for ii in range(self.cutn):
635
-
636
-
637
- # size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)
638
- randsize = torch.zeros(1,).normal_(mean=.8, std=.3).clip(lower_bound,1.)
639
- size_mult = randsize ** self.cut_pow
640
- size = int(min_size_width * (size_mult.clip(lower_bound, 1.))) # replace .5 with a result for 224 the default large size is .95
641
- # size = int(min_size_width*torch.zeros(1,).normal_(mean=.9, std=.3).clip(lower_bound, .95)) # replace .5 with a result for 224 the default large size is .95
642
-
643
- offsetx = torch.randint(0, sideX - size + 1, ())
644
- offsety = torch.randint(0, sideY - size + 1, ())
645
- cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]
646
- cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
647
-
648
-
649
- cutouts = torch.cat(cutouts, dim=0)
650
- cutouts = clamp_with_grad(cutouts, 0, 1)
651
-
652
- #if args.use_augs:
653
- cutouts = self.augs(cutouts)
654
- if self.noise_fac:
655
- facs = cutouts.new_empty([cutouts.shape[0], 1, 1, 1]).uniform_(0, self.noise_fac)
656
- cutouts = cutouts + facs * torch.randn_like(cutouts)
657
- return cutouts
658
-
659
- class MakeCutoutsZynth(nn.Module):
660
- def __init__(self, cut_size, cutn, cut_pow, augs):
661
- super().__init__()
662
- self.cut_size = cut_size
663
- tqdm.write(f'cut size: {self.cut_size}')
664
- self.cutn = cutn
665
- self.cut_pow = cut_pow
666
- self.noise_fac = 0.1
667
- self.av_pool = nn.AdaptiveAvgPool2d((self.cut_size, self.cut_size))
668
- self.max_pool = nn.AdaptiveMaxPool2d((self.cut_size, self.cut_size))
669
- self.augs = nn.Sequential(
670
- K.RandomHorizontalFlip(p=0.5),
671
- # K.RandomSolarize(0.01, 0.01, p=0.7),
672
- K.RandomSharpness(0.3,p=0.4),
673
- K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'),
674
- K.RandomPerspective(0.2,p=0.4),
675
- K.ColorJitter(hue=0.01, saturation=0.01, p=0.7))
676
-
677
-
678
- def set_cut_pow(self, cut_pow):
679
- self.cut_pow = cut_pow
680
-
681
- def forward(self, input):
682
- sideY, sideX = input.shape[2:4]
683
- max_size = min(sideX, sideY)
684
- min_size = min(sideX, sideY, self.cut_size)
685
- cutouts = []
686
- cutouts_full = []
687
- noise_fac = 0.1
688
-
689
-
690
- min_size_width = min(sideX, sideY)
691
- lower_bound = float(self.cut_size/min_size_width)
692
-
693
- for ii in range(self.cutn):
694
-
695
-
696
- # size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)
697
- randsize = torch.zeros(1,).normal_(mean=.8, std=.3).clip(lower_bound,1.)
698
- size_mult = randsize ** self.cut_pow
699
- size = int(min_size_width * (size_mult.clip(lower_bound, 1.))) # replace .5 with a result for 224 the default large size is .95
700
- # size = int(min_size_width*torch.zeros(1,).normal_(mean=.9, std=.3).clip(lower_bound, .95)) # replace .5 with a result for 224 the default large size is .95
701
-
702
- offsetx = torch.randint(0, sideX - size + 1, ())
703
- offsety = torch.randint(0, sideY - size + 1, ())
704
- cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]
705
- cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
706
-
707
-
708
- cutouts = torch.cat(cutouts, dim=0)
709
- cutouts = clamp_with_grad(cutouts, 0, 1)
710
-
711
- #if args.use_augs:
712
- cutouts = self.augs(cutouts)
713
- if self.noise_fac:
714
- facs = cutouts.new_empty([cutouts.shape[0], 1, 1, 1]).uniform_(0, self.noise_fac)
715
- cutouts = cutouts + facs * torch.randn_like(cutouts)
716
- return cutouts
717
-
718
- class MakeCutoutsWyvern(nn.Module):
719
- def __init__(self, cut_size, cutn, cut_pow, augs):
720
- super().__init__()
721
- self.cut_size = cut_size
722
- tqdm.write(f'cut size: {self.cut_size}')
723
- self.cutn = cutn
724
- self.cut_pow = cut_pow
725
- self.noise_fac = 0.1
726
- self.av_pool = nn.AdaptiveAvgPool2d((self.cut_size, self.cut_size))
727
- self.max_pool = nn.AdaptiveMaxPool2d((self.cut_size, self.cut_size))
728
- self.augs = augs
729
-
730
- def forward(self, input):
731
- sideY, sideX = input.shape[2:4]
732
- max_size = min(sideX, sideY)
733
- min_size = min(sideX, sideY, self.cut_size)
734
- cutouts = []
735
- for _ in range(self.cutn):
736
- size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)
737
- offsetx = torch.randint(0, sideX - size + 1, ())
738
- offsety = torch.randint(0, sideY - size + 1, ())
739
- cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]
740
- cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))
741
- return clamp_with_grad(torch.cat(cutouts, dim=0), 0, 1)
742
-
743
- def load_vqgan_model(config_path, checkpoint_path):
744
- config = OmegaConf.load(config_path)
745
- if config.model.target == 'taming.models.vqgan.VQModel':
746
- model = vqgan.VQModel(**config.model.params)
747
- model.eval().requires_grad_(False)
748
- model.init_from_ckpt(checkpoint_path)
749
- elif config.model.target == 'taming.models.cond_transformer.Net2NetTransformer':
750
- parent_model = cond_transformer.Net2NetTransformer(**config.model.params)
751
- parent_model.eval().requires_grad_(False)
752
- parent_model.init_from_ckpt(checkpoint_path)
753
- model = parent_model.first_stage_model
754
- elif config.model.target == 'taming.models.vqgan.GumbelVQ':
755
- model = vqgan.GumbelVQ(**config.model.params)
756
- #print(config.model.params)
757
- model.eval().requires_grad_(False)
758
- model.init_from_ckpt(checkpoint_path)
759
- else:
760
- raise ValueError(f'unknown model type: {config.model.target}')
761
- del model.loss
762
- return model
763
-
764
- import PIL
765
-
766
- def resize_image(image, out_size):
767
- ratio = image.size[0] / image.size[1]
768
- area = min(image.size[0] * image.size[1], out_size[0] * out_size[1])
769
- size = round((area * ratio)**0.5), round((area / ratio)**0.5)
770
- return image.resize(size, PIL.Image.LANCZOS)
771
-
772
- class GaussianBlur2d(nn.Module):
773
- def __init__(self, sigma, window=0, mode='reflect', value=0):
774
- super().__init__()
775
- self.mode = mode
776
- self.value = value
777
- if not window:
778
- window = max(math.ceil((sigma * 6 + 1) / 2) * 2 - 1, 3)
779
- if sigma:
780
- kernel = torch.exp(-(torch.arange(window) - window // 2)**2 / 2 / sigma**2)
781
- kernel /= kernel.sum()
782
- else:
783
- kernel = torch.ones([1])
784
- self.register_buffer('kernel', kernel)
785
-
786
- def forward(self, input):
787
- n, c, h, w = input.shape
788
- input = input.view([n * c, 1, h, w])
789
- start_pad = (self.kernel.shape[0] - 1) // 2
790
- end_pad = self.kernel.shape[0] // 2
791
- input = F.pad(input, (start_pad, end_pad, start_pad, end_pad), self.mode, self.value)
792
- input = F.conv2d(input, self.kernel[None, None, None, :])
793
- input = F.conv2d(input, self.kernel[None, None, :, None])
794
- return input.view([n, c, h, w])
795
-
796
- BUF_SIZE = 65536
797
- def get_digest(path, alg=hashlib.sha256):
798
- hash = alg()
799
- #print(path)
800
- with open(path, 'rb') as fp:
801
- while True:
802
- data = fp.read(BUF_SIZE)
803
- if not data: break
804
- hash.update(data)
805
- return b64encode(hash.digest()).decode('utf-8')
806
-
807
- flavordict = {
808
- "cumin": MakeCutoutsCumin,
809
- "holywater": MakeCutoutsHolywater,
810
- "old_holywater": MakeCutoutsOldHolywater,
811
- "ginger": MakeCutoutsGinger,
812
- "zynth": MakeCutoutsZynth,
813
- "wyvern": MakeCutoutsWyvern,
814
- "aaron": MakeCutoutsAaron,
815
- "moth": MakeCutoutsMoth,
816
- "juu": MakeCutoutsJuu,
817
- }
818
-
819
- @torch.jit.script
820
- def gelu_impl(x):
821
- """OpenAI's gelu implementation."""
822
- return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * x * (1.0 + 0.044715 * x * x)))
823
-
824
-
825
- def gelu(x):
826
- return gelu_impl(x)
827
-
828
-
829
- class MSEDecayLoss(nn.Module):
830
- def __init__(self, init_weight, mse_decay_rate, mse_epoches, mse_quantize ):
831
- super().__init__()
832
-
833
- self.init_weight = init_weight
834
- self.has_init_image = False
835
- self.mse_decay = init_weight / mse_epoches if init_weight else 0
836
- self.mse_decay_rate = mse_decay_rate
837
- self.mse_weight = init_weight
838
- self.mse_epoches = mse_epoches
839
- self.mse_quantize = mse_quantize
840
-
841
- @torch.no_grad()
842
- def set_target( self, z_tensor, model ):
843
- z_tensor = z_tensor.detach().clone()
844
- if self.mse_quantize:
845
- z_tensor = vector_quantize(z_tensor.movedim(1, 3), model.quantize.embedding.weight).movedim(3, 1)#z.average
846
- self.z_orig = z_tensor
847
-
848
- def forward( self, i, z ):
849
- if self.is_active(i):
850
- return F.mse_loss(z, self.z_orig) * self.mse_weight / 2
851
- return 0
852
-
853
- def is_active(self, i):
854
- if not self.init_weight:
855
- return False
856
- if i <= self.mse_decay_rate and not self.has_init_image:
857
- return False
858
- return True
859
-
860
- @torch.no_grad()
861
- def step( self, i ):
862
-
863
- if i % self.mse_decay_rate == 0 and i != 0 and i < self.mse_decay_rate * self.mse_epoches:
864
-
865
- if self.mse_weight - self.mse_decay > 0 and self.mse_weight - self.mse_decay >= self.mse_decay:
866
- self.mse_weight -= self.mse_decay
867
- else:
868
- self.mse_weight = 0
869
- #print(f"updated mse weight: {self.mse_weight}")
870
-
871
- return True
872
-
873
- return False
874
-
875
- class TVLoss(nn.Module):
876
- def forward(self, input):
877
- input = F.pad(input, (0, 1, 0, 1), 'replicate')
878
- x_diff = input[..., :-1, 1:] - input[..., :-1, :-1]
879
- y_diff = input[..., 1:, :-1] - input[..., :-1, :-1]
880
- diff = x_diff**2 + y_diff**2 + 1e-8
881
- return diff.mean(dim=1).sqrt().mean()
882
-
883
- class MultiClipLoss(nn.Module):
884
- def __init__(self, clip_models, text_prompt, cutn, cut_pow=1., clip_weight=1. ):
885
- super().__init__()
886
-
887
- # Load Clip
888
- self.perceptors = []
889
- for cm in clip_models:
890
- sys.stdout.write(f"Loading {cm[0]} ...\n")
891
- sys.stdout.flush()
892
- c = clip.load(cm[0], jit=False)[0].eval().requires_grad_(False).to(device)
893
- self.perceptors.append( { 'res': c.visual.input_resolution, 'perceptor': c, 'weight': cm[1], 'prompts':[] } )
894
- self.perceptors.sort(key=lambda e: e['res'], reverse=True)
895
-
896
- # Make Cutouts
897
- self.max_cut_size = self.perceptors[0]['res']
898
- #self.make_cuts = flavordict[flavor](self.max_cut_size, cutn, cut_pow)
899
- #cutouts = flavordict[flavor](self.max_cut_size, cutn, cut_pow=cut_pow, augs=args.augs)
900
-
901
- # Get Prompt Embedings
902
- #texts = [phrase.strip() for phrase in text_prompt.split("|")]
903
- #if text_prompt == ['']:
904
- # texts = []
905
- texts = text_prompt
906
- self.pMs = []
907
- for prompt in texts:
908
- txt, weight, stop = parse_prompt(prompt)
909
- clip_token = clip.tokenize(txt).to(device)
910
- for p in self.perceptors:
911
- embed = p['perceptor'].encode_text(clip_token).float()
912
- embed_normed = F.normalize(embed.unsqueeze(0), dim=2)
913
- p['prompts'].append({'embed_normed':embed_normed,'weight':torch.as_tensor(weight, device=device),'stop':torch.as_tensor(stop, device=device)})
914
-
915
- # Prep Augments
916
- self.normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],
917
- std=[0.26862954, 0.26130258, 0.27577711])
918
-
919
- self.augs = nn.Sequential(
920
- K.RandomHorizontalFlip(p=0.5),
921
- K.RandomSharpness(0.3,p=0.1),
922
- K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'), # padding_mode=2
923
- K.RandomPerspective(0.2,p=0.4, ),
924
- K.ColorJitter(hue=0.01, saturation=0.01, p=0.7),
925
- K.RandomGrayscale(p=0.15)
926
- )
927
- self.noise_fac = 0.1
928
-
929
- self.clip_weight = clip_weight
930
-
931
- def prepare_cuts(self,img):
932
- cutouts = self.make_cuts(img)
933
- cutouts = self.augs(cutouts)
934
- if self.noise_fac:
935
- facs = cutouts.new_empty([cutouts.shape[0], 1, 1, 1]).uniform_(0, self.noise_fac)
936
- cutouts = cutouts + facs * torch.randn_like(cutouts)
937
- cutouts = self.normalize(cutouts)
938
- return cutouts
939
-
940
- def forward(self, i, img):
941
- cutouts = checkpoint(self.prepare_cuts, img)
942
- loss = []
943
-
944
- current_cuts = cutouts
945
- currentres = self.max_cut_size
946
- for p in self.perceptors:
947
- if currentres != p['res']:
948
- current_cuts = resample(cutouts,(p['res'],p['res']))
949
- currentres = p['res']
950
-
951
- iii = p['perceptor'].encode_image(current_cuts).float()
952
- input_normed = F.normalize(iii.unsqueeze(1), dim=2)
953
- for prompt in p['prompts']:
954
- dists = input_normed.sub(prompt['embed_normed']).norm(dim=2).div(2).arcsin().pow(2).mul(2)
955
- dists = dists * prompt['weight'].sign()
956
- l = prompt['weight'].abs() * replace_grad(dists, torch.maximum(dists, prompt['stop'])).mean()
957
- loss.append(l * p['weight'])
958
-
959
- return loss
960
-
961
- class ModelHost:
962
- def __init__(self, args):
963
- self.args = args
964
- self.model, self.perceptor = None, None
965
- self.make_cutouts = None
966
- self.alt_make_cutouts = None
967
- self.imageSize = None
968
- self.prompts = None
969
- self.opt = None
970
- self.normalize = None
971
- self.z, self.z_orig, self.z_min, self.z_max = None, None, None, None
972
- self.metadata = None
973
- self.mse_weight = 0
974
- self.normal_flip_optim = None
975
- self.usealtprompts = False
976
-
977
- def setup_metadata(self, seed):
978
- metadata = {k:v for k,v in vars(self.args).items()}
979
- del metadata['max_iterations']
980
- del metadata['display_freq']
981
- metadata['seed'] = seed
982
- if (metadata['init_image']):
983
- path = metadata['init_image']
984
- digest = get_digest(path)
985
- metadata['init_image'] = (path, digest)
986
- if (metadata['image_prompts']):
987
- prompts = []
988
- for prompt in metadata['image_prompts']:
989
- path = prompt
990
- digest = get_digest(path)
991
- prompts.append((path,digest))
992
- metadata['image_prompts'] = prompts
993
- self.metadata = metadata
994
-
995
- def setup_model(self, x):
996
- i = x
997
- device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
998
- """
999
- print('Using device:', device)
1000
- if self.args.prompts:
1001
- print('Using prompts:', self.args.prompts)
1002
- if self.args.altprompts:
1003
- print('Using alternate augment set prompts:', self.args.altprompts)
1004
- if self.args.image_prompts:
1005
- print('Using image prompts:', self.args.image_prompts)
1006
- if args.seed is None:
1007
- seed = torch.seed()
1008
- else:
1009
- seed = args.seed
1010
- torch.manual_seed(seed)
1011
- print('Using seed:', seed)
1012
- """
1013
- model = load_vqgan_model(vqgan_config, vqgan_model).to(device)
1014
-
1015
- active_clips = bool(self.args.clip_model2) + bool(self.args.clip_model3) + bool(self.args.clip_model4) + bool(self.args.clip_model5) + bool(self.args.clip_model6)
1016
- if active_clips != 0: clip_weight = round(1/(active_clips+1), 2)
1017
- clip_models = [[clip_model, 1.0]]
1018
- if self.args.clip_model2:
1019
- clip_models = [[self.args.clip_model, clip_weight], [self.args.clip_model2, clip_weight]]
1020
- if self.args.clip_model3:
1021
- clip_models = [[self.args.clip_model, clip_weight], [self.args.clip_model2, clip_weight], [self.args.clip_model3, clip_weight]]
1022
- if self.args.clip_model4:
1023
- clip_models = [[self.args.clip_model, clip_weight], [self.args.clip_model2, clip_weight], [self.args.clip_model3, clip_weight], [self.args.clip_model4, clip_weight]]
1024
- if self.args.clip_model5:
1025
- clip_models = [[self.args.clip_model, clip_weight], [self.args.clip_model2, clip_weight], [self.args.clip_model3, clip_weight], [self.args.clip_model4, clip_weight], [self.args.clip_model5, clip_weight]]
1026
- if self.args.clip_model6:
1027
- clip_models = [[self.args.clip_model, clip_weight], [self.args.clip_model2, clip_weight], [self.args.clip_model3, clip_weight], [self.args.clip_model4, clip_weight], [self.args.clip_model5, clip_weight], [self.args.clip_model6, clip_weight]]
1028
- #print(clip_models)
1029
-
1030
- clip_loss = MultiClipLoss(clip_models, self.args.prompts, cutn=self.args.cutn)
1031
-
1032
- #update_random(self.args.gen_seed, 'image generation')
1033
-
1034
- #[0].eval().requires_grad_(False)
1035
- #perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device)
1036
- #[0].eval().requires_grad_(True)
1037
-
1038
- cut_size = perceptor.visual.input_resolution
1039
-
1040
- if self.args.is_gumbel:
1041
- e_dim = model.quantize.embedding_dim
1042
- else:
1043
- e_dim = model.quantize.e_dim
1044
-
1045
- f = 2**(model.decoder.num_resolutions - 1)
1046
-
1047
- make_cutouts = flavordict[flavor](cut_size, args.mse_cutn, cut_pow=args.mse_cut_pow,augs=args.augs)
1048
-
1049
- #make_cutouts = MakeCutouts(cut_size, args.mse_cutn, cut_pow=args.mse_cut_pow,augs=args.augs)
1050
- if args.altprompts:
1051
- self.usealtprompts = True
1052
- self.alt_make_cutouts = flavordict[flavor](cut_size, args.mse_cutn, cut_pow=args.alt_mse_cut_pow,augs=args.altaugs)
1053
- #self.alt_make_cutouts = MakeCutouts(cut_size, args.mse_cutn, cut_pow=args.alt_mse_cut_pow,augs=args.altaugs)
1054
-
1055
- if self.args.is_gumbel:
1056
- n_toks = model.quantize.n_embed
1057
- else:
1058
- n_toks = model.quantize.n_e
1059
-
1060
- toksX, toksY = args.size[0] // f, args.size[1] // f
1061
- sideX, sideY = toksX * f, toksY * f
1062
-
1063
- if self.args.is_gumbel:
1064
- z_min = model.quantize.embed.weight.min(dim=0).values[None, :, None, None]
1065
- z_max = model.quantize.embed.weight.max(dim=0).values[None, :, None, None]
1066
- else:
1067
- z_min = model.quantize.embedding.weight.min(dim=0).values[None, :, None, None]
1068
- z_max = model.quantize.embedding.weight.max(dim=0).values[None, :, None, None]
1069
-
1070
- from PIL import Image
1071
- import cv2
1072
- #-------
1073
- working_dir = self.args.folder_name
1074
-
1075
- if self.args.init_image != "":
1076
- img_0 = cv2.imread(init_image)
1077
- z, *_ = model.encode(TF.to_tensor(img_0).to(device).unsqueeze(0) * 2 - 1)
1078
- elif not os.path.isfile(f'{working_dir}/steps/{i:04d}.png'):
1079
- one_hot = F.one_hot(torch.randint(n_toks, [toksY * toksX], device=device), n_toks).float()
1080
- if self.args.is_gumbel:
1081
- z = one_hot @ model.quantize.embed.weight
1082
- else:
1083
- z = one_hot @ model.quantize.embedding.weight
1084
- z = z.view([-1, toksY, toksX, e_dim]).permute(0, 3, 1, 2)
1085
- else:
1086
- if save_all_iterations:
1087
- img_0 = cv2.imread(
1088
- f'{working_dir}/steps/{i:04d}_{iterations_per_frame}.png')
1089
- else:
1090
- # Hack to prevent colour inversion on every frame
1091
- img_temp = cv2.imread(f'{working_dir}/steps/{i}.png')
1092
- imageio.imwrite('inverted_temp.png', img_temp)
1093
- img_0 = cv2.imread('inverted_temp.png')
1094
- center = (1*img_0.shape[1]//2, 1*img_0.shape[0]//2)
1095
- trans_mat = np.float32(
1096
- [[1, 0, 10],
1097
- [0, 1, 10]]
1098
- )
1099
- rot_mat = cv2.getRotationMatrix2D( center, 10, 20 )
1100
-
1101
- trans_mat = np.vstack([trans_mat, [0,0,1]])
1102
- rot_mat = np.vstack([rot_mat, [0,0,1]])
1103
- transformation_matrix = np.matmul(rot_mat, trans_mat)
1104
-
1105
- img_0 = cv2.warpPerspective(
1106
- img_0,
1107
- transformation_matrix,
1108
- (img_0.shape[1], img_0.shape[0]),
1109
- borderMode=cv2.BORDER_WRAP
1110
- )
1111
- z, *_ = model.encode(TF.to_tensor(img_0).to(device).unsqueeze(0) * 2 - 1)
1112
-
1113
- def save_output(i, img, suffix='zoomed'):
1114
- filename = \
1115
- f"{working_dir}/steps/{i:04}{'_' + suffix if suffix else ''}.png"
1116
- imageio.imwrite(filename, np.array(img))
1117
-
1118
- save_output(i, img_0)
1119
- #-------
1120
- if args.init_image:
1121
- pil_image = Image.open(args.init_image).convert('RGB')
1122
- pil_image = pil_image.resize((sideX, sideY), Image.LANCZOS)
1123
- z, *_ = model.encode(TF.to_tensor(pil_image).to(device).unsqueeze(0) * 2 - 1)
1124
- else:
1125
- one_hot = F.one_hot(torch.randint(n_toks, [toksY * toksX], device=device), n_toks).float()
1126
- if self.args.is_gumbel:
1127
- z = one_hot @ model.quantize.embed.weight
1128
- else:
1129
- z = one_hot @ model.quantize.embedding.weight
1130
- z = z.view([-1, toksY, toksX, e_dim]).permute(0, 3, 1, 2)
1131
- z = EMATensor(z, args.ema_val)
1132
-
1133
- if args.mse_with_zeros and not args.init_image:
1134
- z_orig = torch.zeros_like(z.tensor)
1135
- else:
1136
- z_orig = z.tensor.clone()
1137
- z.requires_grad_(True)
1138
- #opt = optim.AdamW(z.parameters(), lr=args.mse_step_size, weight_decay=0.00000000)
1139
- if self.normal_flip_optim == True:
1140
- if randint(1,2) == 1:
1141
- opt = torch.optim.AdamW(z.parameters(), lr=args.step_size, weight_decay=0.00000000)
1142
- #opt = Ranger21(z.parameters(), lr=args.step_size, weight_decay=0.00000000)
1143
- else:
1144
- opt = optim.DiffGrad(z.parameters(), lr=args.step_size, weight_decay=0.00000000)
1145
- else:
1146
- opt = torch.optim.AdamW(z.parameters(), lr=args.step_size, weight_decay=0.00000000)
1147
-
1148
- self.cur_step_size =args.mse_step_size
1149
-
1150
- normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],
1151
- std=[0.26862954, 0.26130258, 0.27577711])
1152
-
1153
- pMs = []
1154
- altpMs = []
1155
-
1156
- for prompt in args.prompts:
1157
- txt, weight, stop = parse_prompt(prompt)
1158
- embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
1159
- pMs.append(Prompt(embed, weight, stop).to(device))
1160
-
1161
- for prompt in args.altprompts:
1162
- txt, weight, stop = parse_prompt(prompt)
1163
- embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
1164
- altpMs.append(Prompt(embed, weight, stop).to(device))
1165
-
1166
- from PIL import Image
1167
- for prompt in args.image_prompts:
1168
- path, weight, stop = parse_prompt(prompt)
1169
- img = resize_image(Image.open(path).convert('RGB'), (sideX, sideY))
1170
- batch = make_cutouts(TF.to_tensor(img).unsqueeze(0).to(device))
1171
- embed = perceptor.encode_image(normalize(batch)).float()
1172
- pMs.append(Prompt(embed, weight, stop).to(device))
1173
-
1174
- for seed, weight in zip(args.noise_prompt_seeds, args.noise_prompt_weights):
1175
- gen = torch.Generator().manual_seed(seed)
1176
- embed = torch.empty([1, perceptor.visual.output_dim]).normal_(generator=gen)
1177
- pMs.append(Prompt(embed, weight).to(device))
1178
- if(self.usealtprompts):
1179
- altpMs.append(Prompt(embed, weight).to(device))
1180
-
1181
- self.model, self.perceptor = model, perceptor
1182
- self.make_cutouts = make_cutouts
1183
- self.imageSize = (sideX, sideY)
1184
- self.prompts = pMs
1185
- self.altprompts = altpMs
1186
- self.opt = opt
1187
- self.normalize = normalize
1188
- self.z, self.z_orig, self.z_min, self.z_max = z, z_orig, z_min, z_max
1189
- self.setup_metadata(args2.seed)
1190
- self.mse_weight = self.args.init_weight
1191
-
1192
- def synth(self, z):
1193
- if self.args.is_gumbel:
1194
- z_q = vector_quantize(z.movedim(1, 3), self.model.quantize.embed.weight).movedim(3, 1)
1195
- else:
1196
- z_q = vector_quantize(z.movedim(1, 3), self.model.quantize.embedding.weight).movedim(3, 1)
1197
- return clamp_with_grad(self.model.decode(z_q).add(1).div(2), 0, 1)
1198
-
1199
- def add_metadata(self, path, i):
1200
- imfile = PngImageFile(path)
1201
- meta = PngInfo()
1202
- step_meta = {'iterations':i}
1203
- step_meta.update(self.metadata)
1204
- #meta.add_itxt('vqgan-params', json.dumps(step_meta), zip=True)
1205
- imfile.save(path, pnginfo=meta)
1206
- #Hey you. This one's for Glooperpogger#7353 on Discord (Gloop has a gun), they are a nice snek
1207
-
1208
- @torch.no_grad()
1209
- def checkin(self, i, losses, x):
1210
- out = self.synth(self.z.average)
1211
- TF.to_pil_image(out[0].cpu()).save(args2.image_file)
1212
-
1213
- def unique_index(self, batchpath):
1214
- i = 0
1215
- while i < 10000:
1216
- if os.path.isfile(batchpath+"/"+str(i)+".png"):
1217
- i = i+1
1218
- else:
1219
- return batchpath+"/"+str(i)+".png"
1220
-
1221
- def ascend_txt(self, i):
1222
- out = self.synth(self.z.tensor)
1223
- iii = self.perceptor.encode_image(self.normalize(self.make_cutouts(out))).float()
1224
-
1225
-
1226
- result = []
1227
- if self.args.init_weight and self.mse_weight > 0:
1228
- result.append(F.mse_loss(self.z.tensor, self.z_orig) * self.mse_weight / 2)
1229
-
1230
- for prompt in self.prompts:
1231
- result.append(prompt(iii))
1232
-
1233
- if self.usealtprompts:
1234
- iii = self.perceptor.encode_image(self.normalize(self.alt_make_cutouts(out))).float()
1235
- for prompt in self.altprompts:
1236
- result.append(prompt(iii))
1237
-
1238
- """
1239
- img = np.array(out.mul(255).clamp(0, 255)[0].cpu().detach().numpy().astype(np.uint8))[:,:,:]
1240
- img = np.transpose(img, (1, 2, 0))
1241
- im_path = 'progress.png'
1242
- imageio.imwrite(im_path, np.array(img))
1243
- self.add_metadata(im_path, i)
1244
- """
1245
- return result
1246
-
1247
- def train(self, i,x):
1248
- self.opt.zero_grad()
1249
- mse_decay = self.args.mse_decay
1250
- mse_decay_rate = self.args.mse_decay_rate
1251
- lossAll = self.ascend_txt(i)
1252
-
1253
- sys.stdout.write("Iteration {}".format(i)+"\n")
1254
- sys.stdout.flush()
1255
-
1256
- """
1257
- if i < args.mse_end and i % args.mse_display_freq == 0:
1258
- self.checkin(i, lossAll, x)
1259
- if i == args.mse_end:
1260
- self.checkin(i,lossAll,x)
1261
- if i > args.mse_end and (i-args.mse_end) % args.display_freq == 0:
1262
- self.checkin(i, lossAll, x)
1263
- """
1264
- if i % args2.iterations == 0:
1265
- self.checkin(i, lossAll, x)
1266
-
1267
-
1268
-
1269
- loss = sum(lossAll)
1270
- loss.backward()
1271
- self.opt.step()
1272
- with torch.no_grad():
1273
- if self.mse_weight > 0 and self.args.init_weight and i > 0 and i%mse_decay_rate == 0:
1274
- if self.args.is_gumbel:
1275
- self.z_orig = vector_quantize(self.z.average.movedim(1, 3), self.model.quantize.embed.weight).movedim(3, 1)
1276
- else:
1277
- self.z_orig = vector_quantize(self.z.average.movedim(1, 3), self.model.quantize.embedding.weight).movedim(3, 1)
1278
- if self.mse_weight - mse_decay > 0:
1279
- self.mse_weight = self.mse_weight - mse_decay
1280
- #print(f"updated mse weight: {self.mse_weight}")
1281
- else:
1282
- self.mse_weight = 0
1283
- self.make_cutouts = flavordict[flavor](self.perceptor.visual.input_resolution, args.cutn, cut_pow=args.cut_pow, augs = args.augs)
1284
- if self.usealtprompts:
1285
- self.alt_make_cutouts = flavordict[flavor](self.perceptor.visual.input_resolution, args.cutn, cut_pow=args.alt_cut_pow, augs = args.altaugs)
1286
- self.z = EMATensor(self.z.average, args.ema_val)
1287
- self.new_step_size =args.step_size
1288
- self.opt = torch.optim.AdamW(self.z.parameters(), lr=args.step_size, weight_decay=0.00000000)
1289
- #print(f"updated mse weight: {self.mse_weight}")
1290
- if i > args.mse_end:
1291
- if args.step_size != args.final_step_size and args.max_iterations > 0:
1292
- progress = (i-args.mse_end)/(args.max_iterations)
1293
- self.cur_step_size = lerp(step_size, final_step_size,progress)
1294
- for g in self.opt.param_groups:
1295
- g['lr'] = self.cur_step_size
1296
- #self.z.copy_(self.z.maximum(self.z_min).minimum(self.z_max))
1297
-
1298
- def run(self,x):
1299
- i = 0
1300
- try:
1301
- pbar = tqdm(range(int(args.max_iterations + args.mse_end)))
1302
- while True:
1303
- self.train(i,x)
1304
- if i > 0 and i%args.mse_decay_rate==0 and self.mse_weight > 0:
1305
- self.z = EMATensor(self.z.average, args.ema_val)
1306
- self.opt = torch.optim.AdamW(self.z.parameters(), lr=args.mse_step_size, weight_decay=0.00000000)
1307
- #self.opt = optim.Adgarad(self.z.parameters(), lr=args.mse_step_size, weight_decay=0.00000000)
1308
- if i >= args.max_iterations + args.mse_end:
1309
- pbar.close()
1310
- break
1311
- self.z.update()
1312
- i += 1
1313
- pbar.update()
1314
- except KeyboardInterrupt:
1315
- pass
1316
- return i
1317
-
1318
- def add_noise(img):
1319
-
1320
- # Getting the dimensions of the image
1321
- row , col = img.shape
1322
-
1323
- # Randomly pick some pixels in the
1324
- # image for coloring them white
1325
- # Pick a random number between 300 and 10000
1326
- number_of_pixels = random.randint(300, 10000)
1327
- for i in range(number_of_pixels):
1328
-
1329
- # Pick a random y coordinate
1330
- y_coord=random.randint(0, row - 1)
1331
-
1332
- # Pick a random x coordinate
1333
- x_coord=random.randint(0, col - 1)
1334
-
1335
- # Color that pixel to white
1336
- img[y_coord][x_coord] = 255
1337
-
1338
- # Randomly pick some pixels in
1339
- # the image for coloring them black
1340
- # Pick a random number between 300 and 10000
1341
- number_of_pixels = random.randint(300 , 10000)
1342
- for i in range(number_of_pixels):
1343
-
1344
- # Pick a random y coordinate
1345
- y_coord=random.randint(0, row - 1)
1346
-
1347
- # Pick a random x coordinate
1348
- x_coord=random.randint(0, col - 1)
1349
-
1350
- # Color that pixel to black
1351
- img[y_coord][x_coord] = 0
1352
-
1353
- return img
1354
-
1355
- import io
1356
- import base64
1357
- def image_to_data_url(img, ext):
1358
- img_byte_arr = io.BytesIO()
1359
- img.save(img_byte_arr, format=ext)
1360
- img_byte_arr = img_byte_arr.getvalue()
1361
- # ext = filename.split('.')[-1]
1362
- prefix = f'data:image/{ext};base64,'
1363
- return prefix + base64.b64encode(img_byte_arr).decode('utf-8')
1364
-
1365
- import torch
1366
- import math
1367
- device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
1368
-
1369
- def rand_perlin_2d(shape, res, fade = lambda t: 6*t**5 - 15*t**4 + 10*t**3):
1370
- delta = (res[0] / shape[0], res[1] / shape[1])
1371
- d = (shape[0] // res[0], shape[1] // res[1])
1372
-
1373
- grid = torch.stack(torch.meshgrid(torch.arange(0, res[0], delta[0]), torch.arange(0, res[1], delta[1])), dim = -1) % 1
1374
- angles = 2*math.pi*torch.rand(res[0]+1, res[1]+1)
1375
- gradients = torch.stack((torch.cos(angles), torch.sin(angles)), dim = -1)
1376
-
1377
- tile_grads = lambda slice1, slice2: gradients[slice1[0]:slice1[1], slice2[0]:slice2[1]].repeat_interleave(d[0], 0).repeat_interleave(d[1], 1)
1378
- dot = lambda grad, shift: (torch.stack((grid[:shape[0],:shape[1],0] + shift[0], grid[:shape[0],:shape[1], 1] + shift[1] ), dim = -1) * grad[:shape[0], :shape[1]]).sum(dim = -1)
1379
-
1380
- n00 = dot(tile_grads([0, -1], [0, -1]), [0, 0])
1381
- n10 = dot(tile_grads([1, None], [0, -1]), [-1, 0])
1382
- n01 = dot(tile_grads([0, -1],[1, None]), [0, -1])
1383
- n11 = dot(tile_grads([1, None], [1, None]), [-1,-1])
1384
- t = fade(grid[:shape[0], :shape[1]])
1385
- return math.sqrt(2) * torch.lerp(torch.lerp(n00, n10, t[..., 0]), torch.lerp(n01, n11, t[..., 0]), t[..., 1])
1386
-
1387
- def rand_perlin_2d_octaves( desired_shape, octaves=1, persistence=0.5):
1388
- shape = torch.tensor(desired_shape)
1389
- shape = 2 ** torch.ceil( torch.log2( shape ) )
1390
- shape = shape.type(torch.int)
1391
-
1392
- max_octaves = int(min(octaves,math.log(shape[0])/math.log(2), math.log(shape[1])/math.log(2)))
1393
- res = torch.floor( shape / 2 ** max_octaves).type(torch.int)
1394
-
1395
- noise = torch.zeros(list(shape))
1396
- frequency = 1
1397
- amplitude = 1
1398
- for _ in range(max_octaves):
1399
- noise += amplitude * rand_perlin_2d(shape, (frequency*res[0], frequency*res[1]))
1400
- frequency *= 2
1401
- amplitude *= persistence
1402
-
1403
- return noise[:desired_shape[0],:desired_shape[1]]
1404
-
1405
- def rand_perlin_rgb( desired_shape, amp=0.1, octaves=6 ):
1406
- r = rand_perlin_2d_octaves( desired_shape, octaves )
1407
- g = rand_perlin_2d_octaves( desired_shape, octaves )
1408
- b = rand_perlin_2d_octaves( desired_shape, octaves )
1409
- rgb = ( torch.stack((r,g,b)) * amp + 1 ) * 0.5
1410
- return rgb.unsqueeze(0).clip(0,1).to(device)
1411
-
1412
-
1413
- def pyramid_noise_gen(shape, octaves=5, decay=1.):
1414
- n, c, h, w = shape
1415
- noise = torch.zeros([n, c, 1, 1])
1416
- max_octaves = int(min(math.log(h)/math.log(2), math.log(w)/math.log(2)))
1417
- if octaves is not None and 0 < octaves:
1418
- max_octaves = min(octaves,max_octaves)
1419
- for i in reversed(range(max_octaves)):
1420
- h_cur, w_cur = h // 2**i, w // 2**i
1421
- noise = F.interpolate(noise, (h_cur, w_cur), mode='bicubic', align_corners=False)
1422
- noise += ( torch.randn([n, c, h_cur, w_cur]) / max_octaves ) * decay**( max_octaves - (i+1) )
1423
- return noise
1424
-
1425
- def rand_z(model, toksX, toksY):
1426
- e_dim = model.quantize.e_dim
1427
- n_toks = model.quantize.n_e
1428
- z_min = model.quantize.embedding.weight.min(dim=0).values[None, :, None, None]
1429
- z_max = model.quantize.embedding.weight.max(dim=0).values[None, :, None, None]
1430
-
1431
- one_hot = F.one_hot(torch.randint(n_toks, [toksY * toksX], device=device), n_toks).float()
1432
- z = one_hot @ model.quantize.embedding.weight
1433
- z = z.view([-1, toksY, toksX, e_dim]).permute(0, 3, 1, 2)
1434
-
1435
- return z
1436
-
1437
-
1438
- def make_rand_init( mode, model, perlin_octaves, perlin_weight, pyramid_octaves, pyramid_decay, toksX, toksY, f ):
1439
-
1440
- if mode == 'VQGAN ZRand':
1441
- return rand_z(model, toksX, toksY)
1442
- elif mode == 'Perlin Noise':
1443
- rand_init = rand_perlin_rgb((toksY * f, toksX * f), perlin_weight, perlin_octaves )
1444
- z, *_ = model.encode(rand_init * 2 - 1)
1445
- return z
1446
- elif mode == 'Pyramid Noise':
1447
- rand_init = pyramid_noise_gen( (1,3,toksY * f, toksX * f), pyramid_octaves, pyramid_decay).to(device)
1448
- rand_init = ( rand_init * 0.5 + 0.5 ).clip(0,1)
1449
- z, *_ = model.encode(rand_init * 2 - 1)
1450
- return z
1451
-
1452
- # Commented out IPython magic to ensure Python compatibility.
1453
- #@title <font color="lightgreen" size="+3">←</font> <font size="+2">💠</font> Selection of models to download <font size="+2">💠</font>
1454
- #@markdown By default, the notebook downloads the 16384 model from ImageNet. There are others like COCO, WikiArt 1024, WikiArt 16384, FacesHQ or S-FLCKR, which are heavy, and if you are not going to use them it would be pointless to download them, so if you want to use them, simply select the models to download. (by the way, COCO 1 Stage is a lighter COCO model. WikiArt 7 Mil is a lighter (and worst) WikiArt model.)
1455
- # %cd /content/
1456
-
1457
- #import gdown
1458
- import os
1459
-
1460
- imagenet_1024 = False #@param {type:"boolean"}
1461
- imagenet_16384 = True #@param {type:"boolean"}
1462
- gumbel_8192 = False #@param {type:"boolean"}
1463
- sber_gumbel = False #@param {type:"boolean"}
1464
- #imagenet_cin = False #@param {type:"boolean"}
1465
- coco = False #@param {type:"boolean"}
1466
- coco_1stage = False #@param {type:"boolean"}
1467
- faceshq = False #@param {type:"boolean"}
1468
- wikiart_1024 = False #@param {type:"boolean"}
1469
- wikiart_16384 = False #@param {type:"boolean"}
1470
- wikiart_7mil = False #@param {type:"boolean"}
1471
- sflckr = False #@param {type:"boolean"}
1472
-
1473
- ##@markdown Experimental models (won't probably work, if you know how to make them work, go ahead :D):
1474
- #celebahq = False #@param {type:"boolean"}
1475
- #ade20k = False #@param {type:"boolean"}
1476
- #drin = False #@param {type:"boolean"}
1477
- #gumbel = False #@param {type:"boolean"}
1478
- #gumbel_8192 = False #@param {type:"boolean"}
1479
-
1480
- # Configure and run the model"""
1481
-
1482
- # Commented out IPython magic to ensure Python compatibility.
1483
- #@title <font color="lightgreen" size="+3">←</font> <font size="+2">🏃‍♂️</font> **Configure & Run** <font size="+2">🏃‍♂️</font>
1484
-
1485
- import os
1486
- import random
1487
- import cv2
1488
- #from google.colab import drive
1489
- from PIL import Image
1490
- from importlib import reload
1491
- reload(PIL.TiffTags)
1492
- # %cd /content/
1493
- #@markdown >`prompts` is the list of prompts to give to the AI, separated by `|`. With more than one, it will attempt to mix them together. You can add weights to different parts of the prompt by adding a `p:x` at the end of a prompt (before a `|`) where `p` is the prompt and `x` is the weight.
1494
-
1495
-
1496
- #prompts = "A fantasy landscape, by Greg Rutkowski. A lush mountain.:1 | Trending on ArtStation, unreal engine. 4K HD, realism.:0.63" #@param {type:"string"}
1497
-
1498
- prompts = args2.prompt
1499
-
1500
- width = args2.sizex#@param {type:"number"}
1501
- height = args2.sizey #@param {type:"number"}
1502
-
1503
- sys.stdout.write(f"Loading {args2.vqgan_model} ...\n")
1504
- sys.stdout.flush()
1505
-
1506
- #model = "ImageNet 16384" #@param ['ImageNet 16384', 'ImageNet 1024', "Gumbel 8192", "Sber Gumbel", 'WikiArt 1024', 'WikiArt 16384', 'WikiArt 7mil', 'COCO-Stuff', 'COCO 1 Stage', 'FacesHQ', 'S-FLCKR']
1507
- model = args2.vqgan_model
1508
-
1509
- if model == "Gumbel 8192" or model == "Sber Gumbel":
1510
- is_gumbel = True
1511
- else:
1512
- is_gumbel = False
1513
-
1514
- ##@markdown The flavor effects the output greatly. Each has it's own characteristics and depending on what you choose, you'll get a widely different result with the same prompt and seed. Ginger is the default, nothing special. Cumin results more of a painting, while Holywater makes everythng super funky and/or colorful. Custom is a custom flavor, use the utilities above.
1515
- # Type "old_holywater" to use the old holywater flavor from Hypertron V1
1516
- flavor = args2.flavor #'ginger' #@param ["ginger", "cumin", "holywater", "zynth", "wyvern", "aaron", "moth", "juu", "custom"]
1517
- template = args2.template #'Balanced' #@param ["none", "----------Parameter Tweaking----------", "Balanced", "Detailed", "Consistent Creativity", "Realistic", "Smooth", "Subtle MSE", "Hyper Fast Results", "----------Complete Overhaul----------", "flag", "planet", "creature", "human", "----------Sizes----------", "Size: Square", "Size: Landscape", "Size: Poster", "----------Prompt Modifiers----------", "Better - Fast", "Better - Slow", "Movie Poster", "Negative Prompt", "Better Quality"]
1518
- ##@markdown To use initial or target images, upload it on the left in the file browser. You can also use previous outputs by putting its path below, e.g. `batch_01/0.png`. If your previous output is saved to drive, you can use the checkbox so you don't have to type the whole path.
1519
- init = 'default noise' #@param ["default noise", "image", "random image", "salt and pepper noise", "salt and pepper noise on init image"]
1520
-
1521
- if args2.seed_image is None:
1522
- init_image = "" #args2.seed_image #""#@param {type:"string"}
1523
- else:
1524
- init_image = args2.seed_image #""#@param {type:"string"}
1525
-
1526
- if init == "random image":
1527
- url = "https://picsum.photos/" + str(width) + "/" + str(height) + "?blur=" + str(random.randrange(5, 10))
1528
- urllib.request.urlretrieve(url, "Init_Img/Image.png")
1529
- init_image = "Init_Img/Image.png"
1530
- elif init == "random image clear":
1531
- url = "https://source.unsplash.com/random/" + str(width) + "x" + str(height)
1532
- urllib.request.urlretrieve(url, "Init_Img/Image.png")
1533
- init_image = "Init_Img/Image.png"
1534
- elif init == "random image clear 2":
1535
- url = "https://loremflickr.com/" + str(width) + "/" + str(height)
1536
- urllib.request.urlretrieve(url, "Init_Img/Image.png")
1537
- init_image = "Init_Img/Image.png"
1538
- elif init == "salt and pepper noise":
1539
- urllib.request.urlretrieve("https://i.stack.imgur.com/olrL8.png", "Init_Img/Image.png")
1540
- import cv2
1541
- img = cv2.imread('Init_Img/Image.png', 0)
1542
- cv2.imwrite('Init_Img/Image.png', add_noise(img))
1543
- init_image = "Init_Img/Image.png"
1544
- elif init == "salt and pepper noise on init image":
1545
- img = cv2.imread(init_image, 0)
1546
- cv2.imwrite('Init_Img/Image.png', add_noise(img))
1547
- init_image = "Init_Img/Image.png"
1548
- elif init == "perlin noise":
1549
- #For some reason Colab started crashing from this
1550
- import noise
1551
- import numpy as np
1552
- from PIL import Image
1553
- shape = (width, height)
1554
- scale = 100
1555
- octaves = 6
1556
- persistence = 0.5
1557
- lacunarity = 2.0
1558
- seed = np.random.randint(0,100000)
1559
- world = np.zeros(shape)
1560
- for i in range(shape[0]):
1561
- for j in range(shape[1]):
1562
- world[i][j] = noise.pnoise2(i/scale, j/scale, octaves=octaves, persistence=persistence, lacunarity=lacunarity, repeatx=1024, repeaty=1024, base=seed)
1563
- Image.fromarray(prep_world(world)).convert("L").save("Init_Img/Image.png")
1564
- init_image = "Init_Img/Image.png"
1565
- elif init == "black and white":
1566
- url = "https://www.random.org/bitmaps/?format=png&width=300&height=300&zoom=1"
1567
- urllib.request.urlretrieve(url, "Init_Img/Image.png")
1568
- init_image = "Init_Img/Image.png"
1569
-
1570
-
1571
-
1572
- seed = args2.seed#@param {type:"number"}
1573
- #@markdown >iterations excludes iterations spent during the mse phase, if it is being used. The total iterations will be more if `mse_decay_rate` is more than 0.
1574
- iterations = args2.iterations#@param {type:"number"}
1575
- transparent_png = False #@param {type:"boolean"}
1576
-
1577
- #@markdown <font size="+3">⚠</font> **ADVANCED SETTINGS** <font size="+3">⚠</font>
1578
- #@markdown ---
1579
- #@markdown ---
1580
-
1581
- #@markdown >If you want to make multiple images with different prompts, use this. Seperate different prompts for different images with a `~` (example: `prompt1~prompt1~prompt3`). Iter is the iterations you want each image to run for. If you use MSE, I'd type a pretty low number (about 10).
1582
- multiple_prompt_batches = False #@param {type:"boolean"}
1583
- multiple_prompt_batches_iter = 300#@param {type:"number"}
1584
-
1585
- #@markdown >`folder_name` is the name of the folder you want to output your result(s) to. Previous outputs will NOT be overwritten. By default, it will be saved to the colab's root folder, but the `save_to_drive` checkbox will save it to `MyDrive\VQGAN_Output` instead.
1586
- folder_name = ""#@param {type:"string"}
1587
- save_to_drive = False #@param {type:"boolean"}
1588
- prompt_experiment = "None" #@param ['None', 'Fever Dream', 'Philipuss’s Basement', 'Vivid Turmoil', 'Mad Dad', 'Platinum', 'Negative Energy']
1589
- if prompt_experiment == "Fever Dream":
1590
- prompts = "<|startoftext|>" + prompts + "<|endoftext|>"
1591
- elif prompt_experiment == "Vivid Turmoil":
1592
- prompts = prompts.replace(" ", "¡")
1593
- prompts = "¬" + prompts + "®"
1594
- elif prompt_experiment == "Mad Dad":
1595
- prompts = prompts.replace(" ", '\\s+')
1596
- elif prompt_experiment == "Platinum":
1597
- prompts = "~!" + prompts + "!~"
1598
- prompts = prompts.replace(" ", '</w>')
1599
- elif prompt_experiment == "Philipuss’s Basement":
1600
- prompts = "<|startoftext|>" + prompts
1601
- prompts = prompts.replace(" ", "<|endoftext|><|startoftext|>")
1602
- elif prompt_experiment == "Lowercase":
1603
- prompts = prompts.lower()
1604
-
1605
- clip_model = "ViT-B/32" #"ViT-B/32" #@param ["ViT-L/14", "ViT-B/32", "ViT-B/16", "RN50x64", "RN50x16", "RN50x4", "RN101", "RN50"]
1606
- clip_model2 = "None" #args2.clip_model_2 #'None' #@param ["None", "ViT-L/14", "ViT-B/32", "ViT-B/16", "RN50x64", "RN50x16", "RN50x4", "RN101", "RN50"]
1607
- clip_model3 = "None" #args2.clip_model_3 #'None' #@param ["None", "ViT-L/14", "ViT-B/32", "ViT-B/16", "RN50x64", "RN50x16", "RN50x4", "RN101", "RN50"]
1608
- clip_model4 = "None" #args2.clip_model_4 #'None' #@param ["None", "ViT-L/14", "ViT-B/32", "ViT-B/16", "RN50x64", "RN50x16", "RN50x4", "RN101", "RN50"]
1609
- clip_model5 = "None" #args2.clip_model_5 #'None' #@param ["None", "ViT-L/14", "ViT-B/32", "ViT-B/16", "RN50x64", "RN50x16", "RN50x4", "RN101", "RN50"]
1610
- clip_model6 = "None" #args2.clip_model_6 #'None' #@param ["None", "ViT-L/14", "ViT-B/32", "ViT-B/16", "RN50x64", "RN50x16", "RN50x4", "RN101", "RN50"]
1611
-
1612
- if clip_model2 == "None": clip_model2 = None
1613
- if clip_model3 == "None": clip_model3 = None
1614
- if clip_model4 == "None": clip_model4 = None
1615
- if clip_model5 == "None": clip_model5 = None
1616
- if clip_model6 == "None": clip_model6 = None
1617
-
1618
- #@markdown >Target images work like prompts, write the name of the image. You can add multiple target images by seperating them with a `|`.
1619
- target_images = ""#@param {type:"string"}
1620
-
1621
- #@markdown ><font size="+2">☢</font> Advanced values. Values of cut_pow below 1 prioritize structure over detail, and vice versa for above 1. Step_size affects how wild the change between iterations is, and if final_step_size is not 0, step_size will interpolate towards it over time.
1622
- #@markdown >Cutn affects on 'Creativity': less cutout will lead to more random/creative results, sometimes barely readable, while higher values (90+) lead to very stable, photo-like outputs
1623
- cutn = 130#@param {type:"number"}
1624
- cut_pow = 1#@param {type:"number"}
1625
- #@markdown >Step_size is like weirdness. Lower: more accurate/realistic, slower; Higher: less accurate/more funky, faster.
1626
- step_size = 0.1#@param {type:"number"}
1627
- #@markdown >Start_step_size is a temporary step_size that will be active only in the first 10 iterations. It (sometimes) helps with speed. If it's set to 0, it won't be used.
1628
- start_step_size = 0 #@param {type:"number"}
1629
- #@markdown >Final_step_size is a goal step_size which the AI will try and reach. If set to 0, it won't be used.
1630
- final_step_size = 0#@param {type:"number"}
1631
- if start_step_size <= 0: start_step_size = step_size
1632
- if final_step_size <= 0: final_step_size = step_size
1633
-
1634
- #@markdown ---
1635
-
1636
- #@markdown >EMA maintains a moving average of trained parameters. The number below is the rate of decay (higher means slower).
1637
- ema_val = 0.98#@param {type:"number"}
1638
-
1639
- #@markdown >If you want to keep starting from the same point, set `gen_seed` to a positive number. `-1` will make it random every time.
1640
- gen_seed = -1#@param {type:'number'}
1641
-
1642
-
1643
- init_image_in_drive = False #@param {type:"boolean"}
1644
- if init_image_in_drive and init_image:
1645
- init_image = '/content/drive/MyDrive/VQGAN_Output/' + init_image
1646
-
1647
- images_interval = args2.update#@param {type:"number"}
1648
-
1649
- #I think you should give "Free Thoughts on the Proceedings of the Continental Congress" a read, really funny and actually well-written, Hamilton presented it in a bad light IMO.
1650
-
1651
- batch_size = 1#@param {type:"number"}
1652
-
1653
- #@markdown ---
1654
-
1655
- #@markdown <font size="+1">🔮</font> **MSE Regulization** <font size="+1">🔮</font>
1656
- #Based off of this notebook: https://colab.research.google.com/drive/1gFn9u3oPOgsNzJWEFmdK-N9h_y65b8fj?usp=sharing - already in credits
1657
- use_mse = args2.mse #@param {type:"boolean"}
1658
- mse_images_interval = images_interval
1659
- mse_init_weight = 0.2#@param {type:"number"}
1660
- mse_decay_rate = 160#@param {type:"number"}
1661
- mse_epoches = 10#@param {type:"number"}
1662
- ##@param {type:"number"}
1663
-
1664
- #@markdown >Overwrites the usual values during the mse phase if included. If any value is 0, its normal counterpart is used instead.
1665
- mse_with_zeros = True #@param {type:"boolean"}
1666
- mse_step_size = 0.87 #@param {type:"number"}
1667
- mse_cutn = 42#@param {type:"number"}
1668
- mse_cut_pow = 0.75 #@param {type:"number"}
1669
-
1670
- #@markdown >normal_flip_optim flips between two optimizers during the normal (not MSE) phase. It can improve quality, but it's kind of experimental, use at your own risk.
1671
- normal_flip_optim = True #@param {type:"boolean"}
1672
- ##@markdown >Adding some TV may make the image blurrier but also helps to get rid of noise. A good value to try might be 0.1.
1673
- #tv_weight = 0.1 #@param {type:'number'}
1674
- #@markdown ---
1675
-
1676
- #@markdown >`altprompts` is a set of prompts that take in a different augmentation pipeline, and can have their own cut_pow. At the moment, the default "alt augment" settings flip the picture cutouts upside down before evaluating. This can be good for optical illusion images. If either cut_pow value is 0, it will use the same value as the normal prompts.
1677
- altprompts = "" #@param {type:"string"}
1678
- altprompt_mode = "flipped"
1679
- ##@param ["normal" , "flipped", "sideways"]
1680
- alt_cut_pow = 0 #@param {type:"number"}
1681
- alt_mse_cut_pow = 0 #@param {type:"number"}
1682
- #altprompt_type = "upside-down" #@param ['upside-down', 'as']
1683
-
1684
- ##@markdown ---
1685
- ##@markdown <font size="+1">💫</font> **Zooming and Moving** <font size="+1">💫</font>
1686
- zoom = False
1687
- ##@param {type:"boolean"}
1688
- zoom_speed = 100
1689
- ##@param {type:"number"}
1690
- zoom_frequency = 20
1691
- ##@param {type:"number"}
1692
-
1693
- #@markdown ---
1694
- #@markdown On an unrelated note, if you get any errors while running this, restart the runtime and run the first cell again. If that doesn't work either, message me on Discord (Philipuss#4066).
1695
-
1696
- model_names={'ImageNet 16384': 'vqgan_imagenet_f16_16384', 'ImageNet 1024': 'vqgan_imagenet_f16_1024', "Gumbel 8192": "gumbel_8192", "Sber Gumbel": "sber_gumbel", 'imagenet_cin': 'imagenet_cin', 'WikiArt 1024': 'wikiart_1024', 'WikiArt 16384': 'wikiart_16384', 'COCO-Stuff': 'coco', 'FacesHQ': 'faceshq', 'S-FLCKR': 'sflckr', 'WikiArt 7mil': 'wikiart_7mil', 'COCO 1 Stage': 'coco_1stage'}
1697
-
1698
- if template == "Better - Fast":
1699
- prompts = prompts + ". Detailed artwork. ArtStationHQ. unreal engine. 4K HD."
1700
- elif template == "Better - Slow":
1701
- prompts = prompts + ". Detailed artwork. Trending on ArtStation. unreal engine. | Rendered in Maya. " + prompts + ". 4K HD."
1702
- elif template == "Movie Poster":
1703
- prompts = prompts + ". Movie poster. Rendered in unreal engine. ArtStationHQ."
1704
- width = 400
1705
- height = 592
1706
- elif template == 'flag':
1707
- prompts = "A photo of a flag of the country " + prompts + " | Flag of " + prompts + ". White background."
1708
- #import cv2
1709
- #img = cv2.imread('templates/flag.png', 0)
1710
- #cv2.imwrite('templates/final_flag.png', add_noise(img))
1711
- init_image = "flag.png"
1712
- transparent_png = True
1713
- elif template == 'planet':
1714
- import cv2
1715
- img = cv2.imread('planet.png', 0)
1716
- cv2.imwrite('final_planet.png', add_noise(img))
1717
- prompts = "A photo of the planet " + prompts + ". Planet in the middle with black background. | The planet of " + prompts + ". Photo of a planet. Black background. Trending on ArtStation. | Colorful."
1718
- init_image = "final_planet.png"
1719
- elif template == 'creature':
1720
- #import cv2
1721
- #img = cv2.imread('templates/planet.png', 0)
1722
- #cv2.imwrite('templates/final_planet.png', add_noise(img))
1723
- prompts = "A photo of a creature with " + prompts + ". Animal in the middle with white background. | The creature has " + prompts + ". Photo of a creature/animal. White background. Detailed image of a creature. | White background."
1724
- init_image = "creature.png"
1725
- #transparent_png = True
1726
- elif template == 'Detailed':
1727
- prompts = prompts + ", by Puer Udger. Detailed artwork, trending on artstation. 4K HD, realism."
1728
- flavor = "cumin"
1729
- elif template == "human":
1730
- init_image = "human.png"
1731
- elif template == "Realistic":
1732
- cutn = 200
1733
- step_size = 0.03
1734
- cut_pow = 0.2
1735
- flavor = "holywater"
1736
- elif template == "Consistent Creativity":
1737
- flavor = "cumin"
1738
- cut_pow = 0.01
1739
- cutn = 136
1740
- step_size = 0.08
1741
- mse_step_size = 0.41
1742
- mse_cut_pow = 0.3
1743
- ema_val = 0.99
1744
- normal_flip_optim = False
1745
- elif template == "Smooth":
1746
- flavor = "wyvern"
1747
- step_size = 0.10
1748
- cutn = 120
1749
- normal_flip_optim = False
1750
- tv_weight = 10
1751
- elif template == "Subtle MSE":
1752
- mse_init_weight = 0.07
1753
- mse_decay_rate = 130
1754
- mse_step_size = 0.2
1755
- mse_cutn = 100
1756
- mse_cut_pow = 0.6
1757
- elif template == "Balanced":
1758
- cutn = 130
1759
- cut_pow = 1
1760
- step_size = 0.16
1761
- final_step_size = 0
1762
- ema_val = 0.98
1763
- mse_init_weight = 0.2
1764
- mse_decay_rate = 130
1765
- mse_with_zeros = True
1766
- mse_step_size = 0.9
1767
- mse_cutn = 50
1768
- mse_cut_pow = 0.8
1769
- normal_flip_optim = True
1770
- elif template == "Size: Square":
1771
- width = 450
1772
- height = 450
1773
- elif template == "Size: Landscape":
1774
- width = 480
1775
- height = 336
1776
- elif template == "Size: Poster":
1777
- width = 336
1778
- height = 480
1779
- elif template == "Negative Prompt":
1780
- prompts = prompts.replace(":", ":-")
1781
- prompts = prompts.replace(":--", ":")
1782
- elif template == "Hyper Fast Results":
1783
- step_size = 1
1784
- ema_val = 0.3
1785
- cutn = 30
1786
- elif template == "Better Quality":
1787
- prompts = prompts + ":1 | Watermark, blurry, cropped, confusing, cut, incoherent:-1"
1788
-
1789
- mse_decay = 0
1790
-
1791
- if use_mse == False:
1792
- mse_init_weight = 0.
1793
- else:
1794
- mse_decay = mse_init_weight / mse_epoches
1795
-
1796
- if os.path.isdir('/content/drive') == False:
1797
- if save_to_drive == True or init_image_in_drive == True:
1798
- drive.mount('/content/drive')
1799
-
1800
- if seed == -1:
1801
- seed = None
1802
- if init_image == "None":
1803
- init_image = None
1804
- if target_images == "None" or not target_images:
1805
- target_images = []
1806
- else:
1807
- target_images = target_images.split("|")
1808
- target_images = [image.strip() for image in target_images]
1809
-
1810
- prompts = [phrase.strip() for phrase in prompts.split("|")]
1811
- if prompts == ['']:
1812
- prompts = []
1813
-
1814
- altprompts = [phrase.strip() for phrase in altprompts.split("|")]
1815
- if altprompts == ['']:
1816
- altprompts = []
1817
-
1818
- if mse_images_interval == 0: mse_images_interval = images_interval
1819
- if mse_step_size == 0: mse_step_size = step_size
1820
- if mse_cutn == 0: mse_cutn = cutn
1821
- if mse_cut_pow == 0: mse_cut_pow = cut_pow
1822
- if alt_cut_pow == 0: alt_cut_pow = cut_pow
1823
- if alt_mse_cut_pow == 0: alt_mse_cut_pow = mse_cut_pow
1824
-
1825
- augs = nn.Sequential(
1826
- K.RandomHorizontalFlip(p=0.5),
1827
- K.RandomSharpness(0.3,p=0.4),
1828
- K.RandomGaussianBlur((3,3),(4.5,4.5),p=0.3),
1829
- #K.RandomGaussianNoise(p=0.5),
1830
- #K.RandomElasticTransform(kernel_size=(33, 33), sigma=(7,7), p=0.2),
1831
- K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'), # padding_mode=2
1832
- K.RandomPerspective(0.2,p=0.4, ),
1833
- K.ColorJitter(hue=0.01, saturation=0.01, p=0.7),
1834
- K.RandomGrayscale(p=0.1),
1835
- )
1836
-
1837
-
1838
- if altprompt_mode == "normal":
1839
- altaugs = nn.Sequential(
1840
- K.RandomRotation(degrees=90.0, return_transform=True),
1841
- K.RandomHorizontalFlip(p=0.5),
1842
- K.RandomSharpness(0.3,p=0.4),
1843
- K.RandomGaussianBlur((3,3),(4.5,4.5),p=0.3),
1844
- #K.RandomGaussianNoise(p=0.5),
1845
- #K.RandomElasticTransform(kernel_size=(33, 33), sigma=(7,7), p=0.2),
1846
- K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'), # padding_mode=2
1847
- K.RandomPerspective(0.2,p=0.4, ),
1848
- K.ColorJitter(hue=0.01, saturation=0.01, p=0.7),
1849
- K.RandomGrayscale(p=0.1),)
1850
- elif altprompt_mode == "flipped":
1851
- altaugs = nn.Sequential(
1852
- K.RandomHorizontalFlip(p=0.5),
1853
- #K.RandomRotation(degrees=90.0),
1854
- K.RandomVerticalFlip(p=1),
1855
- K.RandomSharpness(0.3,p=0.4),
1856
- K.RandomGaussianBlur((3,3),(4.5,4.5),p=0.3),
1857
- #K.RandomGaussianNoise(p=0.5),
1858
- #K.RandomElasticTransform(kernel_size=(33, 33), sigma=(7,7), p=0.2),
1859
- K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'), # padding_mode=2
1860
- K.RandomPerspective(0.2,p=0.4, ),
1861
- K.ColorJitter(hue=0.01, saturation=0.01, p=0.7),
1862
- K.RandomGrayscale(p=0.1),)
1863
- elif altprompt_mode == "sideways":
1864
- altaugs = nn.Sequential(
1865
- K.RandomHorizontalFlip(p=0.5),
1866
- #K.RandomRotation(degrees=90.0),
1867
- K.RandomVerticalFlip(p=1),
1868
- K.RandomSharpness(0.3,p=0.4),
1869
- K.RandomGaussianBlur((3,3),(4.5,4.5),p=0.3),
1870
- #K.RandomGaussianNoise(p=0.5),
1871
- #K.RandomElasticTransform(kernel_size=(33, 33), sigma=(7,7), p=0.2),
1872
- K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'), # padding_mode=2
1873
- K.RandomPerspective(0.2,p=0.4, ),
1874
- K.ColorJitter(hue=0.01, saturation=0.01, p=0.7),
1875
- K.RandomGrayscale(p=0.1),)
1876
-
1877
-
1878
-
1879
-
1880
- if multiple_prompt_batches:
1881
- prompts_all = str(prompts).split("~")
1882
- else:
1883
- prompts_all = prompts
1884
- multiple_prompt_batches_iter = iterations
1885
-
1886
- if multiple_prompt_batches:
1887
- mtpl_prmpts_btchs = len(prompts_all)
1888
- else:
1889
- mtpl_prmpts_btchs = 1
1890
-
1891
- #print(mtpl_prmpts_btchs)
1892
-
1893
- steps_path = './'
1894
- zoom_path = './'
1895
-
1896
- path = './'
1897
-
1898
- iterations = multiple_prompt_batches_iter
1899
-
1900
- for pr in range(0, mtpl_prmpts_btchs):
1901
- #print(prompts_all[pr].replace('[\'', '').replace('\']', ''))
1902
- if multiple_prompt_batches:
1903
- prompts = prompts_all[pr].replace('[\'', '').replace('\']', '')
1904
-
1905
- if zoom:
1906
- mdf_iter = round(iterations/zoom_frequency)
1907
- else:
1908
- mdf_iter = 2
1909
- zoom_frequency = iterations
1910
-
1911
- for iter in range(1, mdf_iter):
1912
- if zoom:
1913
- if iter != 0:
1914
- image = Image.open('progress.png')
1915
- area = (0, 0, width-zoom_speed, height-zoom_speed)
1916
- cropped_img = image.crop(area)
1917
- cropped_img.show()
1918
-
1919
- new_image = cropped_img.resize((width, height))
1920
- new_image.save('zoom.png')
1921
- init_image = 'zoom.png'
1922
-
1923
- args = argparse.Namespace(
1924
- prompts=prompts,
1925
- altprompts=altprompts,
1926
- image_prompts=target_images,
1927
- noise_prompt_seeds=[],
1928
- noise_prompt_weights=[],
1929
- size=[width, height],
1930
- init_image=init_image,
1931
- png=transparent_png,
1932
- init_weight= mse_init_weight,
1933
- vqgan_model=model_names[model],
1934
- step_size=step_size,
1935
- start_step_size = start_step_size,
1936
- final_step_size = final_step_size,
1937
- cutn=cutn,
1938
- cut_pow=cut_pow,
1939
- mse_cutn = mse_cutn,
1940
- mse_cut_pow = mse_cut_pow,
1941
- mse_step_size = mse_step_size,
1942
- display_freq=images_interval,
1943
- mse_display_freq = mse_images_interval,
1944
- max_iterations=zoom_frequency,
1945
- mse_end = 0,
1946
- seed=seed,
1947
- folder_name=folder_name,
1948
- save_to_drive=save_to_drive,
1949
- mse_decay_rate = mse_decay_rate,
1950
- mse_decay = mse_decay,
1951
- mse_with_zeros = mse_with_zeros,
1952
- normal_flip_optim = normal_flip_optim,
1953
- ema_val = ema_val,
1954
- augs = augs,
1955
- altaugs = altaugs,
1956
- alt_cut_pow = alt_cut_pow,
1957
- alt_mse_cut_pow = alt_mse_cut_pow,
1958
- is_gumbel = is_gumbel,
1959
- clip_model = clip_model,
1960
- clip_model2 = clip_model2,
1961
- clip_model3 = clip_model3,
1962
- clip_model4 = clip_model4,
1963
- clip_model5 = clip_model5,
1964
- clip_model6 = clip_model6,
1965
- gen_seed = gen_seed)
1966
-
1967
- mh = ModelHost(args)
1968
- x = 0
1969
-
1970
- for x in range(batch_size):
1971
- mh.setup_model(x)
1972
- last_iter = mh.run(x)
1973
- #print(last_iter)
1974
- image_data = Image.open(args2.image_file)
1975
- return(image_data)
1976
-
1977
- if batch_size != 1:
1978
- #clear_output()
1979
- #print("===============================================================================")
1980
- q = 0
1981
- while q < batch_size:
1982
- display(Image('/content/' + folder_name + "/" + str(q) + '.png'))
1983
- #print("Image" + str(q) + '.png')
1984
- q += 1
1985
-
1986
- if zoom:
1987
- files = os.listdir(steps_path)
1988
- for index, file in enumerate(files):
1989
- os.rename(os.path.join(steps_path, file),os.path.join(steps_path,''.join([str(index + 1 + zoom_frequency * iter),'.png'])))
1990
- index = index+1
1991
-
1992
- from pathlib import Path
1993
- import shutil
1994
-
1995
- src_path = steps_path
1996
- trg_path = zoom_path
1997
-
1998
- for src_file in range(1, mdf_iter):
1999
- shutil.move(os.path.join(src_path,src_file),trg_path)
2000
-
2001
- ##################### START GRADIO HERE ############################
2002
- image = gr.outputs.Image(type="pil", label="Your result")
2003
- iface = gr.Interface(
2004
- fn=run_all,
2005
- inputs=[
2006
- gr.inputs.Textbox(label="Prompt - try adding increments to your prompt such as 'oil on canvas', 'a painting', 'a book cover'",default="chalk pastel drawing of a dog wearing a funny hat"),
2007
- gr.inputs.Slider(label="Steps - more steps can increase quality but will take longer to generate",default=300,maximum=300,minimum=1,step=1),
2008
- gr.inputs.Dropdown(label="Style",choices=["none","Balanced","Detailed","Consistent Creativity","Realistic","Smooth","Subtle MSE","Hyper Fast Results"]),
2009
- gr.inputs.Radio(label="Width", choices=[32,64,128,256,512],default=256),
2010
- gr.inputs.Radio(label="Height", choices=[32,64,128,256,512],default=256),
2011
- ],
2012
- outputs=image,
2013
- title="Generate images from text with VQGAN+CLIP",
2014
- #description="<div>By typing a prompt and pressing submit you can generate images based on this prompt. <a href='https://github.com/CompVis/latent-diffusion' target='_blank'>Latent Diffusion</a> is a text-to-image model created by <a href='https://github.com/CompVis' target='_blank'>CompVis</a>, trained on the <a href='https://laion.ai/laion-400-open-dataset/'>LAION-400M dataset.</a><br>This UI to the model was assembled by <a style='color: rgb(245, 158, 11);font-weight:bold' href='https://twitter.com/multimodalart' target='_blank'>@multimodalart</a></div>",
2015
- #article="<h4 style='font-size: 110%;margin-top:.5em'>Biases acknowledgment</h4><div>Despite how impressive being able to turn text into image is, beware to the fact that this model may output content that reinforces or exarcbates societal biases. According to the <a href='https://arxiv.org/abs/2112.10752' target='_blank'>Latent Diffusion paper</a>:<i> \"Deep learning modules tend to reproduce or exacerbate biases that are already present in the data\"</i>. The model was trained on an unfiltered version the LAION-400M dataset, which scrapped non-curated image-text-pairs from the internet (the exception being the the removal of illegal content) and is meant to be used for research purposes, such as this one. <a href='https://laion.ai/laion-400-open-dataset/' target='_blank'>You can read more on LAION's website</a></div><h4 style='font-size: 110%;margin-top:1em'>Who owns the images produced by this demo?</h4><div>Definetly not me! Probably you do. I say probably because the Copyright discussion about AI generated art is ongoing. So <a href='https://www.theverge.com/2022/2/21/22944335/us-copyright-office-reject-ai-generated-art-recent-entrance-to-paradise' target='_blank'>it may be the case that everything produced here falls automatically into the public domain</a>. But in any case it is either yours or is in the public domain.</div>"
2016
- )
2017
- iface.launch(enable_queue=True)