Is the point of this to generate images at a 256x256 resolution?
There's not a lot of practical information in the model card, so I was wondering if this is meant to make smaller images, which are faster, and what the ideal resolution is. On reddit you said "1/2 smaller!" which would be 384x384. Also it would make sense to link to your ckpt file in the model card or just include it in the file list. Also is there a token phrase?
Not exactly, the main purpose is to introduce a small but effective model based on stable diffusion. "1/2 smaller" means that in this model we set layers_per_block=1
, so that the unet part is about 1/2 of original stable diffusion. The resolution of generated images does not matter here.
Currently, we haven't provide the single-file ckpt but you could see the pytorch.bin
file of each part here.