English
art
Stable Diffusion
File size: 7,763 Bytes
ae89015
 
9ced76e
 
 
 
8881820
ae89015
e9a576f
9ced76e
8c4ab21
d1e7766
b8c960d
5507be0
 
9ced76e
b8c960d
8881820
 
b8c960d
8881820
 
b8c960d
 
 
 
 
76d6b62
ddb8777
9ced76e
3046469
 
b8c960d
 
 
 
 
 
ddb8777
b8c960d
 
 
 
 
ddb8777
b8c960d
 
 
 
 
 
 
 
ddb8777
fb79cc3
9ced76e
b8c960d
8881820
 
 
9ced76e
b8c960d
 
 
 
 
 
9ced76e
76d6b62
8881820
 
 
 
 
b8c960d
8881820
 
 
 
 
 
 
 
 
 
 
 
 
 
b8c960d
 
8881820
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8c960d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9ced76e
b8c960d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0007b0f
b8c960d
 
76d6b62
b8c960d
0007b0f
 
8881820
b8c960d
8881820
0007b0f
b8c960d
8881820
0007b0f
b8c960d
 
 
 
 
 
 
 
0007b0f
8881820
 
0007b0f
8881820
b8c960d
0007b0f
8881820
9ced76e
8881820
 
b8227c1
b8c960d
8881820
 
b8c960d
 
 
 
8881820
 
 
 
 
 
 
e9a576f
8881820
 
 
b8227c1
 
6341ebf
ce7511f
e9a576f
8881820
e9a576f
 
b8c960d
394166f
b8efce6
394166f
6341ebf
e9a576f
83e7909
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
license: creativeml-openrail-m
language:
- en
tags:
- art
- Stable Diffusion
---
## Model Card for lyraSD

We consider the Diffusers as the much more extendable framework for the SD ecosystem. Therefore, we have made a **pivot to Diffusers**, leading to a complete update of lyraSD.

lyraSD is currently the **fastest Stable Diffusion model** that can 100% align the outputs of **Diffusers** available, boasting an inference cost of only **0.36 seconds** for a 512x512 image, accelerating the process up to **50% faster** than the original version. 

Among its main features are:

- **All Commonly used** SD1.5 and SDXL pipelines
- - Text2Img
- - Img2Img
- - Inpainting
- - ControlNetText2Img
- - ControlNetImg2Img
- - IpAdapterText2Img
- **Fast ControlNet Hot Swap**: Can hot swap a ControlNet model weights within 0.6s
- **Fast LoRA Hot Swap**: Can hot swap a Lora within 0.14s
- 100% likeness to diffusers output
- Supported Devices: Any GPU with SM version >= 80. For example, Nvidia Nvidia Ampere architecture (A2, A10, A16, A30, A40, A100), RTX 4090, 3080 and etc.

## Speed

### test environment

- Device: Nvidia A100 40G
- Nvidia driver version: 525.105.17
- Nvidia cuda version: 12.0
- Percision:fp16
- Steps: 20
- Sampler: EulerA

### SD1.5 Text2Img Performance
![Alt text](images/sd_txt2img.png)

### SD1.5 ControlNet-Text2Img Performance
![Alt text](images/sd_controlnet_txt2img.png)

### SDXL Text2Img Performance
![Alt text](images/sd_txt2img.png)

### SDXL ControlNet-Text2Img Performance
![Alt text](images/sdxl_controlnet_txt2img.png)

### SD Model Load Performance
![Alt text](images/model_load_performance.png)

## Model Sources

SD1.5
- **Checkpoint:** https://civitai.com/models/7371/rev-animated
- **ControlNet:** https://huggingface.co/lllyasviel/sd-controlnet-canny
- **Lora:** https://civitai.com/models/18323?modelVersionId=46846

SDXL
- **Checkpoint:** https://civitai.com/models/43977?modelVersionId=227916
- **ControlNet:** https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0
- **Lora:** https://civitai.com/models/18323?modelVersionId=46846

## SD1.5 Text2Img Uses

```python
import torch
import time

from lyrasd_model import LyraSdTxt2ImgPipeline

# 存放模型文件的路径,应该包含一下结构(和diffusers一致):
#   1. clip 模型
#   2. 转换好的优化后的 unet 模型,放入其中的 unet_bins 文件夹
#   3. vae 模型
#   4. scheduler 配置

# LyraSD 的 C++ 编译动态链接库,其中包含 C++ CUDA 计算的细节
lib_path = "./lyrasd_model/lyrasd_lib/libth_lyrasd_cu11_sm80.so"
model_path = "./models/lyrasd_rev_animated"
lora_path = "./models/lyrasd_xiaorenshu_lora"

# 构建 Txt2Img 的 Pipeline
model = LyraSdTxt2ImgPipeline(model_path, lib_path)

# load lora
# lora model path, name,lora strength
model.load_lora_v2(lora_path, "xiaorenshu", 0.4)

# 准备应用的输入和超参数
prompt = "a cat, cute, cartoon, concise, traditional, chinese painting, Tang and Song Dynasties, masterpiece, 4k, 8k, UHD, best quality"
negative_prompt = "(((horrible))), (((scary))), (((naked))), (((large breasts))), high saturation, colorful, human:2, body:2, low quality, bad quality, lowres, out of frame, duplicate, watermark, signature, text, frames, cut, cropped, malformed limbs, extra limbs, (((missing arms))), (((missing legs)))"
height, width = 512, 512
steps = 30
guidance_scale = 7
generator = torch.Generator().manual_seed(123)
num_images = 1

start = time.perf_counter()
# 推理生成
images = model(prompt, height, width, steps,
        guidance_scale, negative_prompt, num_images,
        generator=generator)
print("image gen cost: ",time.perf_counter() - start)
# 存储生成的图片
for i, image in enumerate(images):
    image.save(f"outputs/res_txt2img_lora_{i}.png")

# unload lora,      lora’s name,  clear lora cache
model.unload_lora_v2("xiaorenshu", True)
```

## SDXL Text2Img Uses

```python
import torch
import time

from lyrasd_model import LyraSdXLTxt2ImgPipeline

# 存放模型文件的路径,应该包含一下结构:
#   1. clip 模型
#   2. 转换好的优化后的 unet 模型,放入其中的 unet_bins 文件夹
#   3. vae 模型
#   4. scheduler 配置

# LyraSD 的 C++ 编译动态链接库,其中包含 C++ CUDA 计算的细节
lib_path = "./lyrasd_model/lyrasd_lib/libth_lyrasd_cu11_sm80.so"
model_path = "./models/lyrasd_helloworldSDXL20Fp16"
lora_path = "./models/lyrasd_xiaorenshu_lora"

# 构建 Txt2Img 的 Pipeline
model = LyraSdXLTxt2ImgPipeline(model_path, lib_path)

# load lora
# lora model path, name,lora strength
model.load_lora_v2(lora_path, "xiaorenshu", 0.4)

# 准备应用的输入和超参数
prompt = "a cat, cute, cartoon, concise, traditional, chinese painting, Tang and Song Dynasties, masterpiece, 4k, 8k, UHD, best quality"
negative_prompt = "(((horrible))), (((scary))), (((naked))), (((large breasts))), high saturation, colorful, human:2, body:2, low quality, bad quality, lowres, out of frame, duplicate, watermark, signature, text, frames, cut, cropped, malformed limbs, extra limbs, (((missing arms))), (((missing legs)))"
height, width = 512, 512
steps = 30
guidance_scale = 7
generator = torch.Generator().manual_seed(123)
num_images = 1

start = time.perf_counter()
# 推理生成
images = model( prompt,
                height=height,
                width=width,
                num_inference_steps=steps,
                num_images_per_prompt=1,
                guidance_scale=guidance_scale,
                negative_prompt=negative_prompt,
                generator=generator
                )
print("image gen cost: ",time.perf_counter() - start)
# 存储生成的图片
for i, image in enumerate(images):
    image.save(f"outputs/res_txt2img_xl_lora_{i}.png")

# unload lora,参数为 lora 的名字,是否清除 lora 缓存
model.unload_lora_v2("xiaorenshu", True)
```

## Demo output

### Text2Img
#### SD1.5 Text2Img
![text2img_demo](./outputs/res_txt2img_0.png)

#### SD1.5  Text2Img with Lora
![text2img_demo](./outputs/res_txt2img_lora_0.png)

#### SDXL Text2Img
![text2img_demo](./outputs/res_sdxl_txt2img_0.png)

#### SDXL Text2Img with Lora
![text2img_demo](./outputs/res_sdxl_txt2img_lora_0.png)


<!-- ### Img2Img

#### Img2Img input
<img src="https://chuangxin-research-1258344705.cos.ap-guangzhou.myqcloud.com/share/files/seaside_town.png?q-sign-algorithm=sha1&q-ak=AKIDBF6i7GCtKWS8ZkgOtACzX3MQDl37xYty&q-sign-time=1692601590;1865401590&q-key-time=1692601590;1865401590&q-header-list=&q-url-param-list=&q-signature=ca04ca92d990d94813029c0d9ef29537e5f4637c" alt="img2img input" width="512"/>

#### Img2Img output
![text2img_demo](./outputs/res_img2img_0.png) -->

### ControlNet Text2Img

#### Control Image
![text2img_demo](./control_bird_canny.png)

#### SD1.5 ControlNet Text2Img Output
![text2img_demo](./outputs/res_controlnet_txt2img_0.png)

#### SDXL ControlNet Text2Img Output
![text2img_demo](./outputs/res_controlnet_sdxl_txt2img.png)


## Docker Environment Recommendation

- For Cuda 11.X: we recommend ```nvcr.io/nvidia/pytorch:22.12-py3```
- For Cuda 12.0: we recommend ```nvcr.io/nvidia/pytorch:23.02-py3```

```bash
docker pull nvcr.io/nvidia/pytorch:23.02-py3
docker run --rm -it --gpus all -v ./:/lyraSD nvcr.io/nvidia/pytorch:23.02-py3

pip install -r requirements.txt
python txt2img_demo.py
```

## Citation
``` bibtex
@Misc{lyraSD_2023,
  author =       {Kangjian Wu, Zhengtao Wang, Yibo Lu, Haoxiong Su, Bin Wu},
  title =        {lyraSD: Accelerating Stable Diffusion with best flexibility},
  howpublished = {\url{https://huggingface.co/TMElyralab/lyraSD}},
  year =         {2024}
}
```

## Report bug
- start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraSD/discussions
- report bug with a `[bug]` mark in the title.