twodgirl
/

Flux-dev-optimum-quant-qfloat8

Model card Files Files and versions

Flux-dev-optimum-quant-qfloat8 / README.md

twodgirl's picture

Update README.md

59ea6cc verified about 2 months ago

|

No virus

1.81 kB

	---
	license: other
	license_name: flux-1-dev-non-commercial-license
	license_link: >-
	https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
	tags:
	- text-to-image
	- flux
	---

	# Flux Dev

	Run the Flux Dev model with limited VRAM in 8bit mode. It's possible, but inpractical, since the downloads alone are "only" 40GB.

	## Setup

	```
	pip install accelerate diffusers optimum-quanto transformers sentencepiece
	```

	In int4 mode there are places where the pre-trained weights in fp16 overflow, resulting in a blank image.

	## Inference

	```python
	from diffusers import FluxPipeline, FluxTransformer2DModel
	from optimum.quanto.models import QuantizedDiffusersModel, QuantizedTransformersModel
	import torch
	from transformers import T5EncoderModel

	class Flux2DModel(QuantizedDiffusersModel):
	base_class = FluxTransformer2DModel

	class T5Model(QuantizedTransformersModel):
	auto_class = T5EncoderModel

	if __name__ == '__main__':
	T5EncoderModel.from_config = lambda c: T5EncoderModel(c).to(dtype=torch.float16) # Duck and tape for Quanto support.
	t5 = T5Model.from_pretrained('./flux-t5')._wrapped
	transformer = Flux2DModel.from_pretrained('./flux-fp8')._wrapped
	pipe = FluxPipeline.from_pretrained('black-forest-labs/FLUX.1-dev',
	text_encoder_2=t5,
	transformer=transformer)
	# This method moves one whole model at a time to the GPU when it's in forward mode.
	pipe.enable_model_cpu_offload()
	image = pipe('cat playing piano', num_inference_steps=10, output_type='pil').images[0]
	image.save('cat.png')
	```

	## Disclaimer

	Use of this code and the copy of documentation requires citation and attribution to the author via a link to their Hugging Face profile in all resulting work.