File size: 6,425 Bytes
46a65ff
 
93ad14d
 
 
 
d81f975
93ad14d
 
 
f9d54ce
93ad14d
f9d54ce
46a65ff
2443a8e
93ad14d
c2dab09
7fbb07e
 
 
 
 
 
 
 
96ef487
7fbb07e
 
96ef487
7fbb07e
 
96ef487
7fbb07e
 
4a0df8c
8d38984
 
 
 
 
8bfdad4
8d38984
7fbb07e
8bfdad4
 
7fbb07e
3cd5592
 
f9d54ce
3cd5592
f9d54ce
2443a8e
250fcaf
 
2441441
 
1378a3d
7fbb07e
 
f9d54ce
069eeaa
c50833b
 
 
 
 
b41901f
 
 
 
 
 
 
1c89148
b41901f
93ad14d
 
 
b41901f
 
 
 
 
93ad14d
7fbb07e
3a60af4
71be6b5
a478b9c
71be6b5
93ad14d
71be6b5
93ad14d
 
 
71be6b5
93ad14d
 
 
 
71be6b5
 
 
 
faa0371
 
 
 
 
 
71be6b5
 
 
 
 
faa0371
 
71be6b5
faa0371
71be6b5
 
 
 
 
 
93ad14d
71be6b5
93ad14d
 
eb7324f
93ad14d
eb7324f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: creativeml-openrail-m
language:
- en
tags:
- text-to-image
- midjourney
- stable-diffusion
- disco-diffusion
- art
- arxiv:2208.12242
inference: true
library_name: diffusers
---
## Paint Journey V2 is [V1](https://huggingface.co/FredZhang7/paint-journey-v1) fine-tuned on 768x768 oil paintings by Midjourney V4, Open Journey V2, Disco Diffusion, and artists given permission

Begin the prompt with **((oil painting))** to add the oil paint effect. For digital and other painting styles, use similar prompts as you would for Midjourney V4 (with some tweaks), Stable Diffusion v1.5 (add more styles), Open Journey V2, or Disco Diffusion.

[![Open with Camenduru's WebUI in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AMLA-UBC/100-Exploring-the-World-of-Modern-Machine-Learning/blob/main/assets/PaintJourneyV2.ipynb)

## Examples

*All examples were generated using Camenduru's WebUI (see the Colab file)*

![](./assets/characters.png)
*⬆️ 768x1136 portraits, generated using descriptive prompts and without face restoration, [generation parameters](https://huggingface.co/FredZhang7/paint-journey-v2/raw/main/assets/character_settings.txt)*

![](./assets/nature.png)
*⬆️ 1280x768 (mostly) natural landscapes, used shorter prompts, [generation parameters](https://huggingface.co/FredZhang7/paint-journey-v2/raw/main/assets/nature_settings.txt)*

![](./assets/outerspace.png)
*⬆️ 1152x768 outerspace landscapes, used descriptive prompts, [generation parameters](https://huggingface.co/FredZhang7/paint-journey-v2/raw/main/assets/outerspace_settings.txt)*

![](./assets/lamborghini.png)
*⬆️ 1280x768 lamborghini, [generation parameters](https://huggingface.co/FredZhang7/paint-journey-v2/raw/main/assets/lamborghini_settings.txt)*


## Comparisons

Paint Journey V2's paintings are closer to human-drawn art than Open Journey V2.
Compared to models like Dreamlike Diffusion 1.0, PJ V2 tends to generate 768x768 or higher resolution images with reduced noise levels.
This model is also capable of generating stunning portraits at 768x1136 resolution without duplicated faces (with [Camenduru's WebUI](https://github.com/camenduru/stable-diffusion-webui)), a difficult task to models like DreamShaper 3.3.

At lower resolutions, DreamShaper 3.3 tends to generate higher quality portraits than PJ V2 in terms of noise levels, given the same (short) postive and negative prompts.
However, PJ V2 can craft more stunning masterpieces with more descriptive positive and negative prompts and can still generate beautiful landscapes with shorter prompts.


## Training
Instead of solely fine-tuning its Unet, Paint Journey V2 focuses on fine-tuning its text encoder with a diverse range of prompts.
This allows for a seamless blend of the digital and oil painting styles into various other types of prompts, resulting in a more natural and dynamic output.

This model was trained on a curated dataset of roughly 300 images hand-picked from Midjourney, [Prompt Hero](https://prompthero.com/), [PixaBay](https://pixabay.com/images/search/paintings/), Open Journey V2, and Reddit.
Before training, I used R-ESRGAN 4x on many images to increase their resolution and reduce noise.

*PJ V2 will be trained with contrastive prompt-tuning soon to overcome its weaknesses described in Comparisons*


## Running out of prompts?
Useful resources: [Lexica.art](https://lexica.art/), [Fast GPT PromptGen](https://huggingface.co/FredZhang7/distilgpt2-stable-diffusion-v2), [Prompt Hero](https://prompthero.com/)


## Output Dimensions
Portrait sizes include, but are not limited to, `512x768`, `768x768`, and `768x1136`.
Landscape sizes include, but are not limited to, `768x512`, `768x768`, `1152x768`, and `1280x768`.


## Camenduru's WebUI

```
git clone -b v1.6 https://github.com/camenduru/stable-diffusion-webui
```

<details>
<summary> Click to use Automatic1111's Webui instead, but may not output images as artistic </summary>

```
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
```

</details>


Download [checkpoint](./paint_journey_v2.ckpt) and [vae](./paint_journey_v2.vae.pt) to the `./stable-diffusion-webui/models/Stable-diffusion` folder. Run `webui-user.bat`.


## Diffusers

*Tip: using double, tripple, or quadriple brackets around some letters WORD (e.g. "((WORD))") will put an 'emphasis' on WORD*

```bash
pip install --upgrade diffusers transformers
```
```python
from diffusers import StableDiffusionPipeline
import torch, random, datetime

pipe = StableDiffusionPipeline.from_pretrained("FredZhang7/paint-journey-v2")
pipe = pipe.to("cuda")

def random_seed():
  return random.randint(0, 2**32 - 1)


prompt = "((oil painting)), gentle waves, bright blue sky, white sails billowing, sun glistening on the surface, salty sea air, distant horizon, calm breeze, birds soaring overhead, vibrant colors, artstation digital painting, high resolution, uhd, 4 k, 8k wallpaper"   # what you want to see
negative_prompt = ["low-res", "blurry", "haze", "dark clouds looming", "choppy waves", "engine failing", "sails tattered", "stormy winds", "fear of the unknown"]   # what you don't want to see
seed = random_seed()               # replace with the desired seed if needed
width, height = 1280, 768          # width and height of the generated image
cfg_scale = 7.5                    # classifer free guidance scale, 7 to 11 is usually a good range
num_inference_steps = 40           # sampling steps


generator = torch.Generator("cuda").manual_seed(seed)
with torch.autocast("cuda"):
    image = pipe(prompt=prompt,
                  num_inference_steps=num_inference_steps,
                  width=width, height=height,
                  generator=generator,
                  guidance_scale=cfg_scale).images[0]

def generate_filename(string, seed):
    invalid_chars = ["<", ">", ":", '"', "/", "\\", "|", "?", "*"]
    for char in invalid_chars:
        string = string.replace(char, "")
    return f"{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}_{seed}_{string}"

image.save(f"./{generate_filename(prompt, seed)}.png")
```

## Safety Checker V2

The official [stable diffusion safety checker](https://huggingface.co/CompVis/stable-diffusion-safety-checker) uses up 1.22GB VRAM.
I recommend using [Google Safesearch Mini V2](https://huggingface.co/FredZhang7/google-safesearch-mini-v2) (220MB) to save 1.0GB VRAM.