README.md · TMElyralab/lyraSD at b8c960d505d2bdd52be10a46eac9d21446d5a4e9

metadata

license: creativeml-openrail-m
language:
  - en
tags:
  - art
  - Stable Diffusion

Model Card for lyraSD

We consider the Diffusers as the much more extendable framework for the SD ecosystem. Therefore, we have made a pivot to Diffusers, leading to a complete update of lyraSD.

lyraSD is currently the fastest Stable Diffusion model that can 100% align the outputs of Diffusers available, boasting an inference cost of only 0.36 seconds for a 512x512 image, accelerating the process up to 50% faster than the original version.

Among its main features are:

All Commonly used SD1.5 and SDXL pipelines
- Text2Img
- Img2Img
- Inpainting
- ControlNetText2Img
- ControlNetImg2Img
- IpAdapterText2Img
Fast ControlNet Hot Swap: Can hot swap a ControlNet model weights within 0.6s
Fast LoRA Hot Swap: Can hot swap a Lora within 0.14s
100% likeness to diffusers output
Supported Devices: Any GPU with SM version >= 80. For example, Nvidia Nvidia Ampere architecture (A2, A10, A16, A30, A40, A100), RTX 4090, 3080 and etc.

Speed

test environment

Device: Nvidia A100 40G
Nvidia driver version: 525.105.17
Nvidia cuda version: 12.0
Percision:fp16
Steps: 20
Sampler: EulerA

SD1.5 Text2Img Performance

SD1.5 ControlNet-Text2Img Performance

SDXL Text2Img Performance

SDXL ControlNet-Text2Img Performance

SD Model Load Performance

Model Sources

SD1.5

Checkpoint: https://civitai.com/models/7371/rev-animated
ControlNet: https://huggingface.co/lllyasviel/sd-controlnet-canny
Lora: https://civitai.com/models/18323?modelVersionId=46846

SDXL

Checkpoint: https://civitai.com/models/43977?modelVersionId=227916
ControlNet: https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0
Lora: https://civitai.com/models/18323?modelVersionId=46846

SD1.5 Text2Img Uses

import torch
import time

from lyrasd_model import LyraSdTxt2ImgPipeline

# 存放模型文件的路径，应该包含一下结构(和diffusers一致)：
#   1. clip 模型
#   2. 转换好的优化后的 unet 模型，放入其中的 unet_bins 文件夹
#   3. vae 模型
#   4. scheduler 配置

# LyraSD 的 C++ 编译动态链接库，其中包含 C++ CUDA 计算的细节
lib_path = "./lyrasd_model/lyrasd_lib/libth_lyrasd_cu11_sm80.so"
model_path = "./models/lyrasd_rev_animated"
lora_path = "./models/lyrasd_xiaorenshu_lora"

# 构建 Txt2Img 的 Pipeline
model = LyraSdTxt2ImgPipeline(model_path, lib_path)

# load lora
# lora model path, name，lora strength
model.load_lora_v2(lora_path, "xiaorenshu", 0.4)

# 准备应用的输入和超参数
prompt = "a cat, cute, cartoon, concise, traditional, chinese painting, Tang and Song Dynasties, masterpiece, 4k, 8k, UHD, best quality"
negative_prompt = "(((horrible))), (((scary))), (((naked))), (((large breasts))), high saturation, colorful, human:2, body:2, low quality, bad quality, lowres, out of frame, duplicate, watermark, signature, text, frames, cut, cropped, malformed limbs, extra limbs, (((missing arms))), (((missing legs)))"
height, width = 512, 512
steps = 30
guidance_scale = 7
generator = torch.Generator().manual_seed(123)
num_images = 1

start = time.perf_counter()
# 推理生成
images = model(prompt, height, width, steps,
        guidance_scale, negative_prompt, num_images,
        generator=generator)
print("image gen cost: ",time.perf_counter() - start)
# 存储生成的图片
for i, image in enumerate(images):
    image.save(f"outputs/res_txt2img_lora_{i}.png")

# unload lora，      lora’s name,  clear lora cache
model.unload_lora_v2("xiaorenshu", True)

SDXL Text2Img Uses

import torch
import time

from lyrasd_model import LyraSdXLTxt2ImgPipeline

# 存放模型文件的路径，应该包含一下结构：
#   1. clip 模型
#   2. 转换好的优化后的 unet 模型，放入其中的 unet_bins 文件夹
#   3. vae 模型
#   4. scheduler 配置

# LyraSD 的 C++ 编译动态链接库，其中包含 C++ CUDA 计算的细节
lib_path = "./lyrasd_model/lyrasd_lib/libth_lyrasd_cu11_sm80.so"
model_path = "./models/lyrasd_helloworldSDXL20Fp16"
lora_path = "./models/lyrasd_xiaorenshu_lora"

# 构建 Txt2Img 的 Pipeline
model = LyraSdXLTxt2ImgPipeline(model_path, lib_path)

# load lora
# lora model path, name，lora strength
model.load_lora_v2(lora_path, "xiaorenshu", 0.4)

# 准备应用的输入和超参数
prompt = "a cat, cute, cartoon, concise, traditional, chinese painting, Tang and Song Dynasties, masterpiece, 4k, 8k, UHD, best quality"
negative_prompt = "(((horrible))), (((scary))), (((naked))), (((large breasts))), high saturation, colorful, human:2, body:2, low quality, bad quality, lowres, out of frame, duplicate, watermark, signature, text, frames, cut, cropped, malformed limbs, extra limbs, (((missing arms))), (((missing legs)))"
height, width = 512, 512
steps = 30
guidance_scale = 7
generator = torch.Generator().manual_seed(123)
num_images = 1

start = time.perf_counter()
# 推理生成
images = model( prompt,
                height=height,
                width=width,
                num_inference_steps=steps,
                num_images_per_prompt=1,
                guidance_scale=guidance_scale,
                negative_prompt=negative_prompt,
                generator=generator
                )
print("image gen cost: ",time.perf_counter() - start)
# 存储生成的图片
for i, image in enumerate(images):
    image.save(f"outputs/res_txt2img_xl_lora_{i}.png")

# unload lora，参数为 lora 的名字，是否清除 lora 缓存
model.unload_lora_v2("xiaorenshu", True)

Demo output

Text2Img

SD1.5 Text2Img

SD1.5 Text2Img with Lora

SDXL Text2Img

SDXL Text2Img with Lora

ControlNet Text2Img

Control Image

SD1.5 ControlNet Text2Img Output

SDXL ControlNet Text2Img Output

Docker Environment Recommendation

For Cuda 11.X: we recommend nvcr.io/nvidia/pytorch:22.12-py3
For Cuda 12.0: we recommend nvcr.io/nvidia/pytorch:23.02-py3

docker pull nvcr.io/nvidia/pytorch:23.02-py3
docker run --rm -it --gpus all -v ./:/lyraSD nvcr.io/nvidia/pytorch:23.02-py3

pip install -r requirements.txt
python txt2img_demo.py

Citation

@Misc{lyraSD_2023,
  author =       {Kangjian Wu, Zhengtao Wang, Yibo Lu, Haoxiong Su, Bin Wu},
  title =        {lyraSD: Accelerating Stable Diffusion with best flexibility},
  howpublished = {\url{https://huggingface.co/TMElyralab/lyraSD}},
  year =         {2024}
}

Report bug

start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraSD/discussions
report bug with a [bug] mark in the title.