File size: 12,178 Bytes
b20d77d b175bfa b20d77d e86d90e b20d77d b1b6907 9a3ec25 b1b6907 9a3ec25 b1b6907 9a3ec25 b1b6907 b20d77d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
---
tags:
- text-to-image
- sd3
- lora
- diffusers
- template:sd-lora
- ai-toolkit
- history
- photo
- rank256
widget:
- text: 'HST style communist poster with text "JOIN RCA!", over autochrome color photo of Vladimir Lenin at a Dada cabaret in 1916 Zurich,
dancing with red feathered drunken dinosaur, an early conceptual artist. Lenin is full of contageous awe, his blemished skin flushing
with anxious excitement, his famous bald spot sweatily glistening under warm lights. In the back, Krupskaya and Inessa Armand laugh. '
output:
url: sd35_4k_v.jpg
- text: >-
HST style photograph of a dark CIA agent Koala leaping at an excited
Julian Assange and trying to steal his pills from his pockets,
caption /PILLZAR! WHERE YOUR PILLS ARE!/, award-winning art photo
output:
url: sd35_4k_i.jpg
- text: >-
HST style photo of realistic green-eyed black and white furred cat
playing a piano and singing while pills rain from the sky,
large 3d font caption text of /PILLZAR! WHERE YOUR PILLS ARE!/
amateur photo shot on a cell phone
output:
url: sd35_4k_ii.jpg
- text: >-
HST style autochrome photo with title / PILLZAR! THAT IS WHERE YOUR PILLS ARE! /
of an ether-drugged Pikachu sitting in a white plastic cylindical stacked
medication dispenser with an unscrewable top, while gowned Marina Tsvetaeva
gently pets pikachu on the head, David Lynch and Mucha styles,
detailed faces, in a European city circa 1920
output:
url: sd35_4k_iii.jpg
base_model: stabilityai/stable-diffusion-3.5-large
license: creativeml-openrail-m
language:
- en
pipeline_tag: text-to-image
---
# HSTsd3ii
Model trained with [AI Toolkit by Ostris](https://github.com/ostris/ai-toolkit)
<Gallery />
## Trigger words
**HST style autochrome photo**
## Parameters/Settings/Options Info
*Dim:256 Alpha:256 Optimizer:Adamw8bit LR:4e-5 * <br>
I SET THE CONFIG TO ONLY TRAIN A SINGLE BLOCK: <br>
Namely, MMDiT block 12. I used the same config syntax I've repeatedly used for training Flux. <br>
But I'm not sure single block training worked here, judging by the results and the super hefty checkpoint weights sizes.* <br>
Fine-tuned using the **Google Colab Notebook*** of **ai-toolkit**.<br>
I've used A100 via Colab Pro.
However, training SD3.5 may potentially work with Free Colab or lower VRAM in general:<br>
Especially if one were to use:<br> ...Say, *lower rank (try 4 or 8), dataset size (in terms of caching/bucketing/pre-loading impacts), 1 batch size, Adamw8bit optimizer, 512 resolution, maybe adding the /lowvram, true/ argument, and plausibly specifying alternate quantization variants.* <br>
Generally, VRAM expenditures for fine-tuning SD3.5 tend to be lower than for Flux during training.<br>
So, try it!<br>
## Colab Config
**To use on Colab**, modify a Flux template Notebook from [here](https://github.com/ostris/ai-toolkit/tree/main/notebooks) with parameters from Ostris' example config for SD3.5 [here](https://github.com/ostris/ai-toolkit/blob/main/config/examples/train_lora_sd35_large_24gb.yaml)! <br>
**My Colab config report/example below!** <br> *(Including the version of block-specification network arguments syntax that works on ai-toolkit via Colab, at least for Flux...)* <br>
```
from collections import OrderedDict
job_to_run = OrderedDict([
('job', 'extension'),
('config', OrderedDict([
# this name will be the folder and filename name
('name', 'HSTsd3v'),
('process', [
OrderedDict([
('type', 'sd_trainer'),
# root folder to save training sessions/samples/weights
('training_folder', '/content/drive/MyDrive/HSTsd3v'),
# uncomment to see performance stats in the terminal every N steps
('performance_log_every', 600),
('device', 'cuda:0'),
# if a trigger word is specified, it will be added to captions of training data if it does not already exist
# alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word
('HST', 'photo'),
# ('network', OrderedDict([
# ('type', 'lora'),
# ('linear', 64),
# ('linear_alpha', 64)
# ])),
('network', OrderedDict([
('type', 'lora'),
('linear', 256),
('linear_alpha', 256),
('network_kwargs', OrderedDict([
('only_if_contains', "transformer.transformer_blocks.{12}")]))
])),
('save', OrderedDict([
('dtype', 'float16'), # precision to save
('save_every', 250), # save every this many steps
('push_to_hub', True),
('hf_repo_id', 'AlekseyCalvin/HSTsd3v'),
('hf_private', False),
('max_step_saves_to_keep', 10) # how many intermittent saves to keep
])),
('datasets', [
# datasets are a folder of images. captions need to be txt files with the same name as the image
# for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently
# images will automatically be resized and bucketed into the resolution specified
OrderedDict([
('folder_path', '/content/dataset'),
('caption_ext', 'txt'),
('caption_dropout_rate', 0.05), # will drop out the caption 5% of time
('shuffle_tokens', False), # shuffle caption order, split by commas
('cache_latents_to_disk', True), # leave this true unless you know what you're doing
('resolution', [1024])
])
]),
('train', OrderedDict([
('batch_size', 1),
('steps', 4000), # total number of steps to train 500 - 4000 is a good range
('gradient_accumulation_steps', 1),
\
('train_unet', True),
('train_text_encoder', False), # May not fully work with SD3 yet
('gradient_checkpointing', True), # need the on unless you have a ton of vram
('noise_scheduler', 'flowmatch'), # for training only
('timestep_type', 'linear'), # linear or sigmoid
('optimizer', 'adamw8bit'),
('lr', 4e-5),
# uncomment this to skip the pre training sample
('skip_first_sample', True),
# uncomment to completely disable sampling
# ('disable_sampling', True),
# uncomment to use new vell curved weighting. Experimental but may produce better results
#('linear_timesteps', True),
# ema will smooth out learning, but could slow it down. Recommended to leave on.
('ema_config', OrderedDict([
('use_ema', True),
('ema_decay', 0.99)
])),
# will probably need this if gpu supports it for flux, other dtypes may not work correctly
('dtype', 'bf16')
])),
('model', OrderedDict([
# huggingface model name or path
('name_or_path', 'stabilityai/stable-diffusion-3.5-large'),
('is_v3', True),
('quantize', True), # run 8bit mixed precision
# low_vram is painfully slow to fuse in the adapter avoid it unless absolutely necessary
# ('low_vram', True), # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower.
])),
('sample', OrderedDict([
('sampler', 'flowmatch'), # must match train.noise_scheduler
('sample_every', 200), # sample every this many steps
('width', 1024),
('height', 1024),
('prompts', [
# you can add [trigger] to the prompts here and it will be replaced with the trigger word
#'[trigger] holding a sign that says \'I LOVE PROMPTS!\'',
' HST style communist poster with title text "JOIN RCA!", under an autochrome color photograph of Vladimir Lenin at a Cabaret in Zurich circa 1916, dancing with a red feathered drunken anarchist, an early conceptual artist. Singing to his new dancing partner, Lenin has a face full of contageous awe, his moderately blemished lined skin with visible pores flushing with anxious excitement, his bald spot sweatily glistening under warm lights. Behind, Krupskaya and Inessa Armand sit side-by-side at a bar stand uproriously laughing at the dancers.',
'HST autochrome style analog dslr award-winning 8k art photo showing a nurse battling a giant scattered pill creature, above text caption of \PILLZAR! WHERE PILLS ARE!\, in a highly realistic modern American medical hospital ',
'HST style photograph of a dark CIA agent Koala leaping at an excited Julian Assange and trying to steal his pills from his pockets, caption /PILLZAR! WHERE YOUR PILLS ARE!/, award-winning art photo',
'HST style photo of realistic green-eyed black and white furred cat playing a piano and singing while pills rain from the sky, large 3d font caption text of /PILLZAR! WHERE YOUR PILLS ARE!/ amateur photo shot on a cell phone',
'HST style autochrome photo poster with 3d title / PILLZAR! THAT IS WHERE YOUR PILLS ARE! / of an ether-drugged Pikachu sitting in a white plastic cylindical stacked medication dispenser with an unscrewable top, while a gowned Marina Tsvetaeva gently pets pikachu on the head, David Lynch and Mucha styles, detailed faces, in a European city circa 1920, lifelike anatomy'
]),
('neg', 'wrong, broken, warped, unrealistic, untextured, misspelling, messy, bad quality'), # not used on flux
('seed', 42),
('walk_seed', True),
('guidance_scale', 4),
('sample_steps', 25) # 1 - 4 works well
]))
])
])
])),
# you can add any additional meta info here. [name] is replaced with config name at top
('meta', OrderedDict([
('name', '[name]'),
('version', '1.0')
]))
])
```
## Download model and use it with ComfyUI, AUTOMATIC1111, SD.Next, Invoke AI, etc.
Weights for this model are available in Safetensors format.
[Download](/AlekseyCalvin/HSTsd3iii/tree/main) them in the Files & versions tab.
## Use it with the [🧨 diffusers library](https://github.com/huggingface/diffusers)
```py
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained('stabilityai/stable-diffusion-3.5-large', torch_dtype=torch.float16).to('cuda')
pipeline.load_lora_weights('AlekseyCalvin/HSTsd3iii', weight_name='HSTsd3ii.safetensors')
image = pipeline(' HST style photorealistic communist poster with text "JOIN RCA!", over autochrome color photo of Vladimir Lenin at a Dada cabaret in 1916 Zurich, dancing with a red feathered drunken dinosaur who is an early conceptual artist. Lenin has a face full of contageous awe, his blemished skin with visible pores flushing with anxious excitement and his famous bald spot sweatily glistening under the cabaret's warm lights. In the back, Krupskaya and Inessa Armand uproriously laugh at the dancers.').images[0]
image.save("my_image.png")
```
For more details, including weighting, merging and fusing LoRAs, check the [documentation on loading LoRAs in diffusers](https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading_adapters) |