twodgirl
/

Flux-dev-optimum-quant-qfloat8

Text-to-Image

Diffusers

Safetensors

flux

Model card Files Files and versions

twodgirl commited on Aug 7

Commit

93f6ee5

•

1 Parent(s): 25924b4

Create README.md

Browse files

Files changed (1) hide show

README.md +85 -0

README.md ADDED Viewed

	@@ -0,0 +1,85 @@

+---
+license: other
+tags:
+- text-to-image
+- flux
+---
+# Flux Dev Quant
+## Setup
+```
+pip install accelerate diffusers optimum-quanto transformers sentencepiece
+pip install --upgrade git+https://github.com/huggingface/diffusers.git@main
+```
+There are places where the pre-trained weights in fp16 **overflow**, resulting in a blank image. Wait for the updated diffusers library.
+## Inference
+```python
+from diffusers import AutoencoderKL, FluxPipeline, FlowMatchEulerDiscreteScheduler, FluxTransformer2DModel
+import gc
+from optimum.quanto.models import QuantizedDiffusersModel, QuantizedTransformersModel
+import sys
+import torch
+from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5TokenizerFast
+class Flux2DModel(QuantizedDiffusersModel):
+    base_class = FluxTransformer2DModel
+class T5Model(QuantizedTransformersModel):
+    auto_class = T5EncoderModel
+FLUX_DEV = sys.argv[1] if len(sys.argv) > 1 else 'black-forest-labs/FLUX.1-dev'
+FLUX_INT = sys.argv[2] if len(sys.argv) > 2 else './flux-int4'
+T5_INT = sys.argv[3] if len(sys.argv) > 3 else './flux-t5'
+SAMPLER_STEP = 2
+PROMPT_CLIP = ''
+PROMPT_T5 = sys.argv[4] if len(sys.argv) > 4 else 'cat playing piano'
+if __name__ == '__main__':
+    torch.set_default_dtype(torch.float16)
+    print('Step 1/5')
+    T5EncoderModel.from_config = lambda c: T5EncoderModel(c)  # Duck and tape for Quanto support.
+    wrapped_t5 = T5Model.from_pretrained(T5_INT)
+    print('Step 2/5')
+    wrapped_model = Flux2DModel.from_pretrained(FLUX_INT)
+    print('Step 3/5')
+    pipe = FluxPipeline.from_pretrained(FLUX_DEV,
+                                        scheduler=FlowMatchEulerDiscreteScheduler.from_pretrained(FLUX_DEV, subfolder='scheduler'),
+                                        text_encoder=CLIPTextModel.from_pretrained(FLUX_DEV, subfolder='text_encoder'),
+                                        text_encoder_2=wrapped_t5._wrapped,
+                                        tokenizer=CLIPTokenizer.from_pretrained(FLUX_DEV, subfolder='tokenizer'),
+                                        tokenizer_2=T5TokenizerFast.from_pretrained(FLUX_DEV, subfolder='tokenizer_2'),
+                                        transformer=wrapped_model._wrapped,
+                                        vae=AutoencoderKL.from_pretrained(FLUX_DEV, subfolder='vae'),
+                                        torch_dtype=torch.float16).to('cuda')
+    latents = pipe('cat playing piano', num_inference_steps=SAMPLER_STEP, output_type='latent').images
+    print('Step 4/5')
+    transformer = pipe.transformer.to('cpu')
+    te_2 = pipe.text_encoder_2.to('cpu')
+    pipe.transformer = None
+    pipe.text_encoder_2 = None
+    del transformer
+    del te_2
+    gc.collect()
+    torch.cuda.empty_cache()
+    print('Step 5/5')
+    latents = FluxPipeline._unpack_latents(latents, 1024, 1024, pipe.vae_scale_factor)
+    latents = (latents / pipe.vae.config.scaling_factor) + pipe.vae.config.shift_factor
+    # Either use fp16 or move vae to cpu and keep it in full precision.
+    vae: AutoencoderKL = pipe.vae.to(dtype=torch.float16)
+    image, = vae.decode(latents.to(dtype=vae.dtype), return_dict=False)
+    image = pipe.image_processor.postprocess(image.detach(), output_type='pil')[0]
+    image.save('./cat.png')
+```
+## Disclaimer
+Use of this code and the copy of documentation requires citation and attribution to the author via a link to their Hugging Face profile in all resulting work.
+## License
+[FLUX.1 Dev Non-Commercial License](http://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)