File size: 12,178 Bytes

---
tags:
- text-to-image
- sd3
- lora
- diffusers
- template:sd-lora
- ai-toolkit
- history
- photo
- rank256
widget:
- text: 'HST style communist poster with text "JOIN RCA!", over autochrome color photo of Vladimir Lenin at a Dada cabaret in 1916 Zurich, 
dancing with red feathered drunken dinosaur, an early conceptual artist. Lenin is full of contageous awe, his blemished skin flushing 
with anxious excitement, his famous bald spot sweatily glistening under warm lights. In the back, Krupskaya and Inessa Armand laugh. '
  output:
    url: sd35_4k_v.jpg
- text: >-
    HST style photograph of a dark CIA agent Koala leaping at an excited 
    Julian Assange and trying to steal his pills from his pockets, 
    caption /PILLZAR! WHERE YOUR PILLS ARE!/, award-winning art photo

  output:
    url: sd35_4k_i.jpg
- text: >-
    HST style photo of realistic green-eyed black and white furred cat 
    playing a piano and singing while pills rain from the sky, 
    large 3d font caption text of /PILLZAR! WHERE YOUR PILLS ARE!/ 
    amateur photo shot on a cell phone

  output:
    url: sd35_4k_ii.jpg
- text: >-
    HST style autochrome photo with title / PILLZAR! THAT IS WHERE YOUR PILLS ARE! / 
    of an ether-drugged Pikachu sitting in a white plastic cylindical stacked 
    medication dispenser with an unscrewable top, while gowned Marina Tsvetaeva 
    gently pets pikachu on the head, David Lynch and Mucha styles, 
    detailed faces, in a European city circa 1920
  output:
    url: sd35_4k_iii.jpg
base_model: stabilityai/stable-diffusion-3.5-large
license: creativeml-openrail-m
language:
- en
pipeline_tag: text-to-image
---

# HSTsd3ii
Model trained with [AI Toolkit by Ostris](https://github.com/ostris/ai-toolkit)
<Gallery />

## Trigger words

**HST style autochrome photo**

## Parameters/Settings/Options Info
*Dim:256 Alpha:256 Optimizer:Adamw8bit LR:4e-5 * <br> 
I SET THE CONFIG TO ONLY TRAIN A SINGLE BLOCK: <br> 
Namely, MMDiT block 12. I used the same config syntax I've repeatedly used for training Flux. <br> 
But I'm not sure single block training worked here, judging by the results and the super hefty checkpoint weights sizes.* <br>
Fine-tuned using the **Google Colab Notebook*** of **ai-toolkit**.<br>
I've used A100 via Colab Pro.
However, training SD3.5 may potentially work with Free Colab or lower VRAM in general:<br> 
Especially if one were to use:<br> ...Say, *lower rank (try 4 or 8), dataset size (in terms of caching/bucketing/pre-loading impacts), 1 batch size, Adamw8bit optimizer, 512 resolution, maybe adding the /lowvram, true/ argument, and plausibly specifying alternate quantization variants.* <br>
Generally, VRAM expenditures for fine-tuning SD3.5 tend to be lower than for Flux during training.<br> 
So, try it!<br>

## Colab Config

**To use on Colab**, modify a Flux template Notebook from [here](https://github.com/ostris/ai-toolkit/tree/main/notebooks) with parameters from Ostris' example config for SD3.5 [here](https://github.com/ostris/ai-toolkit/blob/main/config/examples/train_lora_sd35_large_24gb.yaml)! <br>
 **My Colab config report/example below!** <br> *(Including the version of block-specification network arguments syntax that works on ai-toolkit via Colab, at least for Flux...)* <br>

```
from collections import OrderedDict

job_to_run = OrderedDict([
    ('job', 'extension'),
    ('config', OrderedDict([
        # this name will be the folder and filename name
        ('name', 'HSTsd3v'),
        ('process', [
            OrderedDict([
                ('type', 'sd_trainer'),
                # root folder to save training sessions/samples/weights
                ('training_folder', '/content/drive/MyDrive/HSTsd3v'),
                # uncomment to see performance stats in the terminal every N steps
                ('performance_log_every', 600),
                ('device', 'cuda:0'),
                # if a trigger word is specified, it will be added to captions of training data if it does not already exist
                # alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word
                ('HST', 'photo'),
  #              ('network', OrderedDict([
   #                 ('type', 'lora'),
    #                ('linear', 64),
     #               ('linear_alpha', 64)
      #          ])),

                ('network', OrderedDict([
                    ('type', 'lora'),
                    ('linear', 256),
                    ('linear_alpha', 256),
('network_kwargs', OrderedDict([
                      ('only_if_contains', "transformer.transformer_blocks.{12}")]))
                ])),
                ('save', OrderedDict([
                    ('dtype', 'float16'),  # precision to save
                    ('save_every', 250),  # save every this many steps
                    ('push_to_hub', True),
                    ('hf_repo_id', 'AlekseyCalvin/HSTsd3v'),
                    ('hf_private', False),
                    ('max_step_saves_to_keep', 10)  # how many intermittent saves to keep
                ])),
                ('datasets', [
                    # datasets are a folder of images. captions need to be txt files with the same name as the image
                    # for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently
                    # images will automatically be resized and bucketed into the resolution specified
                    OrderedDict([
                        ('folder_path', '/content/dataset'),
                        ('caption_ext', 'txt'),
                        ('caption_dropout_rate', 0.05),  # will drop out the caption 5% of time
                        ('shuffle_tokens', False),  # shuffle caption order, split by commas
                        ('cache_latents_to_disk', True),  # leave this true unless you know what you're doing
                        ('resolution', [1024])
                    ])
                ]),
                ('train', OrderedDict([
                    ('batch_size', 1),
                    ('steps', 4000),  # total number of steps to train 500 - 4000 is a good range
                    ('gradient_accumulation_steps', 1),
                    \
                    ('train_unet', True),
                    ('train_text_encoder', False), # May not fully work with SD3 yet
                    ('gradient_checkpointing', True),  # need the on unless you have a ton of vram
                    ('noise_scheduler', 'flowmatch'),  # for training only
                    ('timestep_type', 'linear'), # linear or sigmoid
                    ('optimizer', 'adamw8bit'),
                    ('lr', 4e-5),

                    # uncomment this to skip the pre training sample
                   ('skip_first_sample', True),

                    # uncomment to completely disable sampling
                    # ('disable_sampling', True),

                    # uncomment to use new vell curved weighting. Experimental but may produce better results
                   #('linear_timesteps', True),

                    # ema will smooth out learning, but could slow it down. Recommended to leave on.
                    ('ema_config', OrderedDict([
                        ('use_ema', True),
                        ('ema_decay', 0.99)
                    ])),

                    # will probably need this if gpu supports it for flux, other dtypes may not work correctly
                    ('dtype', 'bf16')
                ])),
                ('model', OrderedDict([
                    # huggingface model name or path
                    ('name_or_path', 'stabilityai/stable-diffusion-3.5-large'),
                    ('is_v3', True),
                    ('quantize', True),  # run 8bit mixed precision
                    # low_vram is painfully slow to fuse in the adapter avoid it unless absolutely necessary
#                    ('low_vram', True),  # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower.
                ])),
                ('sample', OrderedDict([
                    ('sampler', 'flowmatch'),  # must match train.noise_scheduler
                    ('sample_every', 200),  # sample every this many steps
                    ('width', 1024),
                    ('height', 1024),
                    ('prompts', [
                        # you can add [trigger] to the prompts here and it will be replaced with the trigger word
                        #'[trigger] holding a sign that says \'I LOVE PROMPTS!\'',
           ' HST style communist poster with title text "JOIN RCA!", under an autochrome color photograph of Vladimir Lenin at a Cabaret in Zurich circa 1916, dancing with a red feathered drunken anarchist, an early conceptual artist. Singing to his new dancing partner, Lenin has a face full of contageous awe, his moderately blemished lined skin with visible pores flushing with anxious excitement, his bald spot sweatily glistening under warm lights. Behind, Krupskaya and Inessa Armand sit side-by-side at a bar stand uproriously laughing at the dancers.',
                        'HST autochrome style analog dslr award-winning 8k art photo showing a nurse battling a giant scattered pill creature, above text caption of \PILLZAR! WHERE PILLS ARE!\, in a highly realistic modern American medical hospital ',
                        'HST style photograph of a dark CIA agent Koala leaping at an excited Julian Assange and trying to steal his pills from his pockets, caption /PILLZAR! WHERE YOUR PILLS ARE!/, award-winning art photo',
                        'HST style photo of realistic green-eyed black and white furred cat playing a piano and singing while pills rain from the sky, large 3d font caption text of /PILLZAR! WHERE YOUR PILLS ARE!/ amateur photo shot on a cell phone',
                        'HST style autochrome photo poster with 3d title / PILLZAR! THAT IS WHERE YOUR PILLS ARE! / of an ether-drugged Pikachu sitting in a white plastic cylindical stacked medication dispenser with an unscrewable top, while a gowned Marina Tsvetaeva gently pets pikachu on the head, David Lynch and Mucha styles, detailed faces, in a European city circa 1920, lifelike anatomy'
                    ]),
                    ('neg', 'wrong, broken, warped, unrealistic, untextured, misspelling, messy, bad quality'),  # not used on flux
                    ('seed', 42),
                    ('walk_seed', True),
                    ('guidance_scale', 4),
                    ('sample_steps', 25) # 1 - 4 works well
                ]))
            ])
        ])
    ])),
    # you can add any additional meta info here. [name] is replaced with config name at top
    ('meta', OrderedDict([
        ('name', '[name]'),
        ('version', '1.0')
    ]))
])
```

## Download model and use it with ComfyUI, AUTOMATIC1111, SD.Next, Invoke AI, etc.

Weights for this model are available in Safetensors format.

[Download](/AlekseyCalvin/HSTsd3iii/tree/main) them in the Files & versions tab.

## Use it with the [🧨 diffusers library](https://github.com/huggingface/diffusers)

```py
from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained('stabilityai/stable-diffusion-3.5-large', torch_dtype=torch.float16).to('cuda')
pipeline.load_lora_weights('AlekseyCalvin/HSTsd3iii', weight_name='HSTsd3ii.safetensors')
image = pipeline(' HST style photorealistic communist poster with text "JOIN RCA!", over autochrome color photo of Vladimir Lenin at a Dada cabaret in 1916 Zurich, dancing with a red feathered drunken dinosaur who is an early conceptual artist. Lenin has a face full of contageous awe, his blemished skin with visible pores flushing with anxious excitement and his famous bald spot sweatily glistening under the cabaret's warm lights. In the back, Krupskaya and Inessa Armand uproriously laugh at the dancers.').images[0]
image.save("my_image.png")
```

For more details, including weighting, merging and fusing LoRAs, check the [documentation on loading LoRAs in diffusers](https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading_adapters)