Stable Diffusion Model - PyTorch & Hugging Face Diffusers
This repository contains the implementation of the Stable Diffusion model using PyTorch and Hugging Face Diffusers. Stable Diffusion is a text-to-image generative model that leverages a diffusion process to generate high-quality, detailed images from textual descriptions.
Table of Contents
Installation
To get started, you'll need to clone this repository and install the required dependencies. We recommend using a virtual environment to avoid conflicts.
git clone https://github.com/the-antique-piece/stable_diffusion.git
cd stable_diffusion
# Create and activate a virtual environment (optional)
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install required dependencies
pip install -r requirements.txt
Requirements
- Python 3.8+
- PyTorch
- Hugging Face Diffusers
- Transformers
- Datasets
- PIL
To install all dependencies manually, you can run:
pip install torch diffusers transformers datasets pillow flax
Model Overview
Stable Diffusion is a latent diffusion model that is trained to denoise a latent representation of the image, conditioned on a text prompt. It operates by gradually reversing a noise process applied to the data during training, allowing it to generate images starting from pure noise.
This repository implements the following features:
- Text-to-image generation: Generate images based on a text prompt.
- Fine-tuning: Customize the model for specific datasets.
- Inference: Run the model on pre-trained weights for fast image generation.
Model Architecture
The Stable Diffusion model consists of:
- Variational Autoencoder (VAE) - Encodes images into latent space.
- U-Net - A denoising network that learns to reverse the noise process.
- Text Encoder - Encodes text prompts into latent space to guide image generation.
Usage
Text-to-Image Generation
Once the environment is set up, you can generate images from text prompts as follows:
from diffusers import DiffusionPipeline
import torch
# Remove torch_dtype=torch.float16
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion/stable-diffusion-v1")
# Use a Nvidia GPU if available, or else cpu
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipeline.to(device)
pipeline("An image of futuristic city where everyting is perfect").images[0]
To get more control over image generation create seperate python file and paste this code and run it from virtual environment using 'python python_script.py'
If You don't running this model on nvidia GPU change torch_type=torch.float32
from diffusers import DiffusionPipeline
import torch
# Provide a path to directory where the model_index.json is placed
weights_path = "directory_path_to_model_index.json"
pipeline = DiffusionPipeline.from_pretrained(weights_path, torch_dtype=torch.float16)
# You can change prompt to get different photos, increase inference_steps's value to get high quality images
prompt = 'a cat sitting on a windowsill, looking at the sunset'
height, width = 512, 512
num_inference_steps = 50
image = pipeline(prompt, height=height, width=width, num_inference_steps=num_inference_steps).images[0]
image.save("myimage.png")
Custom Model Weights
If you have custom model weights, load them into the pipeline:
pipe = StableDiffusionPipeline.from_pretrained("path/to/your/model").to("cuda")
Training
This repository also supports fine-tuning the Stable Diffusion model on your own dataset. To prepare for training:
- Prepare Dataset: Ensure that your dataset is in a format compatible with Hugging Face's
datasets
library. - Configure Training Parameters: Adjust hyperparameters such as learning rate, batch size, and number of epochs.
Fine-tuning Example
python train.py --dataset_path /path/to/dataset --output_dir /path/to/output --batch_size 8 --learning_rate 5e-5 --num_epochs 10
Training can be done with the train.py
script, which supports distributed training for large datasets and multiple GPUs.
Inference
To run inference on a trained model, use the inference.py
script:
python inference.py --model_path /path/to/trained/model --prompt "a futuristic city skyline at sunset"
Examples
Here are some example prompts and the corresponding generated images:
Acknowledgments
This implementation is based on the Stable Diffusion model by Huggingface and utilizes the Huggingface Diffusers library.
License
This project is licensed under the terms of the Apache 2.0 License.
- Downloads last month
- 1