external CLIP vs internal; VRAM utilization question
internal clip was baked at fp8 as to optimise / reduce computational requirements for people with low VRam, the model's weights were merged in such a way to allow generations at 4 steps,
as far as i know you cannot speed it up, it only takes 4 seconds to generate 1024 x 1024 on 24GB VRAM
Hate to post in a closed topic but I'm not sure the T5 weights in this checkpoint are actually FP8. Transformer +CLIP+T5+VAE checkpoints for Flux that are FP8 should be ~17GB, the +4GB makes me think the T5 was saved as FP16 while the transformer was saved as FP8. See https://huggingface.co/Comfy-Org/flux1-dev/tree/main as an example.
ok that corroborates my finings that generations were the same with external fp16 and the internal
I'm confident i baked the t5xxl_fp8_e4m3fn transformer into this model merge.
The resulting file size will depend on the type of merge [formula] and the quant map.
Every merge will have a different formula and file size - i used the formula provided by Kijai for quantization,and comfyorgs tip on the 4 step merge.
Prior to merging, the quantized the models using Kijai’s formula, resulted in two 12 gb files.
Additionally, I baked the VAE and CLIP fp8 components into the models wich weight extra 4.5 gb + 319mb
The T5 model in fp16 versus fp8 doesn’t show a significant difference in generation, so many people prefer using the FP8 model for its greater computational efficiency during generation.
You could also perform a test on the model to observe VRAM usage during inference to assess its efficiency on an 8gb Card vs an external fp16 t5 clip .
Ah, I think I get it now. I didn't notice the merging tip part before.
The T5 model in fp16 versus fp8 doesn’t show a significant difference in generation, so many people prefer using the FP8 model for its greater computational efficiency during generation.
gudYou could also perform a test on the model to observe VRAM usage during inference to assess its efficiency on an 8gb Card vs an external fp16 t5 clip .
i did not
@drbaph
just heads up we need that merge in NF4 ... like... ASAP
https://www.reddit.com/r/StableDiffusion/comments/1epcdov/bitsandbytes_guidelines_and_flux_6gb8gb_vram/