Update README.md
#1
by
yoorhim
- opened
README.md
CHANGED
@@ -1 +1,41 @@
|
|
1 |
-
# KOALA-700M Model Card
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# KOALA-700M Model Card
|
2 |
+
|
3 |
+
## Model Discription
|
4 |
+
KOALA, which stands for **KnOwledge-distillAtion in LAtent diffusion model**, marks a notable advancement in text-to-image (T2I) synthesis technology. This model is engineered to balance speed and performance effectively, making it ideal for resource-limited environments. By emphasizing self-attention in knowledge distillation, KOALA significantly enhances the accessibility and efficiency of high-quality text-to-image synthesis, particularly in settings with constrained resources. This approach represents a major leap forward in the field of T2I technology.
|
5 |
+
|
6 |
+
## Key Features
|
7 |
+
- **Efficient U-Net Architecture**: KOALA models use a simplified U-Net architecture that reduces the model size by up to 54% and 69% respectively compared to its predecessor, Stable Diffusion XL (SDXL).
|
8 |
+
- **Self-Attention-Based Knowledge Distillation**: The core technique in KOALA focuses on the distillation of self-attention features, which proves crucial for maintaining image generation quality.
|
9 |
+
|
10 |
+
## Model Architecture
|
11 |
+
|
12 |
+
## Usage with 🤗[Diffusers library](https://github.com/huggingface/diffusers)
|
13 |
+
The inference code with denoising step 25
|
14 |
+
```python
|
15 |
+
import torch
|
16 |
+
from diffusers import StableDiffusionXLPipeline
|
17 |
+
|
18 |
+
pipe = StableDiffusionXLPipeline.from_pretrained("etri-vilab/koala-700m", torch_dtype=torch.float16)
|
19 |
+
pipe = pipe.to("cuda")
|
20 |
+
|
21 |
+
prompt = "A portrait painting of a Golden Retriever like Leonard da Vinci"
|
22 |
+
negative = "worst quality, low quality, illustration, low resolution"
|
23 |
+
image = pipe(prompt=prompt, negative_prompt=negative).images[0]
|
24 |
+
```
|
25 |
+
|
26 |
+
## Limitations and Bias
|
27 |
+
- Text Rendering: The models face challenges in rendering long, legible text within images.
|
28 |
+
- Complex Prompts: KOALA sometimes struggles with complex prompts involving multiple attributes.
|
29 |
+
- Dataset Dependencies: The current limitations are partially attributed to the characteristics of the training dataset (LAION-aesthetics-V2 6+).
|
30 |
+
|
31 |
+
## Citation
|
32 |
+
```bibtex
|
33 |
+
@misc{Lee@koala,
|
34 |
+
title={KOALA: Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis},
|
35 |
+
author={Youngwan Lee and Kwanyong Park and Yoorhim Cho and Yong-Ju Lee and Sung Ju Hwang},
|
36 |
+
year={2023},
|
37 |
+
eprint={2312.04005},
|
38 |
+
archivePrefix={arXiv},
|
39 |
+
primaryClass={cs.CV}
|
40 |
+
}
|
41 |
+
```
|