metadata

license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
tags:
  - text-to-image
  - flux

Flux Dev

Run the Flux Dev model with limited VRAM in 8bit mode. It's possible, but inpractical, since the downloads alone are "only" 40GB.

Setup

pip install accelerate diffusers optimum-quanto transformers sentencepiece

In int4 mode there are places where the pre-trained weights in fp16 overflow, resulting in a blank image.

Inference

from diffusers import FluxPipeline, FluxTransformer2DModel
from optimum.quanto.models import QuantizedDiffusersModel, QuantizedTransformersModel
import torch
from transformers import T5EncoderModel

class Flux2DModel(QuantizedDiffusersModel):
    base_class = FluxTransformer2DModel

class T5Model(QuantizedTransformersModel):
    auto_class = T5EncoderModel

if __name__ == '__main__':
    T5EncoderModel.from_config = lambda c: T5EncoderModel(c).to(dtype=torch.float16)  # Duck and tape for Quanto support.
    t5 = T5Model.from_pretrained('./flux-t5')._wrapped
    transformer = Flux2DModel.from_pretrained('./flux-fp8')._wrapped
    pipe = FluxPipeline.from_pretrained('black-forest-labs/FLUX.1-dev',
                                        text_encoder_2=t5,
                                        transformer=transformer)
    # This method moves one whole model at a time to the GPU when it's in forward mode.
    pipe.enable_model_cpu_offload()
    image = pipe('cat playing piano', num_inference_steps=10, output_type='pil').images[0]
    image.save('cat.png')

Disclaimer

Use of this code and the copy of documentation requires citation and attribution to the author via a link to their Hugging Face profile in all resulting work.