Running on lower Vram
I heard this can be run on >10 Vram card, I have a 3070 8gb and when I tried to generate a picture I ran out of Vram, any settings I can change to get it running ?
It's really strange. I've runned stable-diffusion-v-1-3-diffusers model on GeForce RTX 2060 SUPER (8 GB vRAM). It's generates one image in ~ 45 secs with 100 steps perfectly.
Are you generating several samples or one sample? It may ran out of vRAM if you try to generate several samples in one go.
I tried running the txt2img.py with default settings, what do you use to get it to work on your 2060?
It's really strange. I've runned stable-diffusion-v-1-3-diffusers model on GeForce RTX 2060 SUPER (8 GB vRAM). It's generates one image in ~ 45 secs with 100 steps perfectly.
Are you generating several samples or one sample? It may ran out of vRAM if you try to generate several samples in one go.
I tried running the txt2img.py with default settings, what do you use to get it to work on your 2060?
It's really strange. I've runned stable-diffusion-v-1-3-diffusers model on GeForce RTX 2060 SUPER (8 GB vRAM). It's generates one image in ~ 45 secs with 100 steps perfectly.
Are you generating several samples or one sample? It may ran out of vRAM if you try to generate several samples in one go.I tried running the txt2img.py with default settings, what do you use to get it to work on your 2060?
Again, I am using diffusers model. So I think they may be different results.
My settings are: n_samples=1, num_inference_steps=50, guidance_scale=7
Wallpaper Engine/any other GPU intensive apps if running can really influence the GPU capabilities, so consider disabling it if running.
https://twitter.com/EMostaque/status/1557862289394515973 This tweet says that the current model can run on 5.1 Gb VRAM, but my 6Gb GPU is giving me out of memory error. Any suggestions : (?
https://twitter.com/EMostaque/status/1557862289394515973 This tweet says that the current model can run on 5.1 Gb VRAM, but my 6Gb GPU is giving me out of memory error. Any suggestions : (?
By default the code processes 3 images at a time. To reduce the memory footprint, reduce "--_samples" to 1. You can also use --W and --H to reduce the size of the generated image (512x512 default) and thereboy reduce the memory footprint.
On a 3060 12GB I can generate images up to 708x512 using n_samples=1
With this version of the weights, on an 8GB card I can generate 1 image if I go down to 448x448, and the quality seems fine. If I try much smaller, the images look broken.
512x512 does not work with this set up as is for me, but I suspect it can be squeezed further in the implementation somewhere.
With the diffusers code I can generate the full 512x512 on the same card.
Edit: Have added support for batched inference. It can now generate images in batches which reduces inference time to 40 seconds per image on 6Gb RTX 2060 when using a batch size of 6 : )
I have created a modified version of the repo that can run on lower VRAM but requires a slightly longer inference time. It can generate a 512x512 image on a 6Gb GPU (RTX 2060 in my case) in 75 seconds. Please feel free to check it out and give suggestions. https://github.com/basujindal/stable-diffusion
hi basujindal, any setting I can change in the optimizedSD files that would lower the bar to 4Gb GPU? Using your files I can run a 128x128 image but not higher. I do not mind long inference time. I just want to run a higher resolution with my poor gtx 970
hi basujindal, any setting I can change in the optimizedSD files that would lower the bar to 4Gb GPU? Using your files I can run a 128x128 image but not higher. I do not mind long inference time. I just want to run a higher resolution with my poor gtx 970
Hi, I am trying to modify the code to make it run on a lower VRAM but it seems complicated. Will let you know if it works.
hi basujindal, any setting I can change in the optimizedSD files that would lower the bar to 4Gb GPU? Using your files I can run a 128x128 image but not higher. I do not mind long inference time. I just want to run a higher resolution with my poor gtx 970
Hi, I have updated the repo. Now, you can generate 512x512 images in under 4Gb. Cheers!
try also tinkering with downsampling. Integrated the option into my colab if you want to play with it (via TXT2IMG). If you're running it local, just use the arg
https://colab.research.google.com/drive/1jUwJ0owjigpG-9m6AI_wEStwimisUE17#scrollTo=9QnhfmAM0t-X
Haven't played with the option, but thought it could be a good approach to try and reduce VRAM usage
You can also remove the vae encoder part to save some more RAM before moving the models on GPU as it's not needed for the pipe line. E.g.
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stabe-diffusion-v1-4", use_auth_token=True, torch_dtype=torch.float16)
del pipe.vae.encoder