black-forest-labs/FLUX.1-schnell · Not running on MacOS ComfyUI

Aug 3

Hi folks,

Hoping this is a simple fix. Currently getting this error when running the schnell or dev model on a Mac with working Comfy installation, using the default workflow.

"Error occurred when executing SamplerCustomAdvanced: BFloat16 is not supported on MPS"

Any tips on how to resolve this?

Thank you,
Will.

karanf

Aug 4

same issue here. I have no idea how to fix it

FiditeNemini

Aug 5

•

edited Aug 5

I did try --force-fp16 --cpu as startup parameters when launching comfy and it seems to work then, just really slowly, especially the -dev version where you need more steps. This worked for my setup, but ymmv. No idea if this is supposed to even work, tbh.

Codyfcb

Aug 5

yeah it works with --cpu but its very very slow. Maybe you could run it with cuda.

karanf

Aug 5

Got this to work yesterday when I came across someone else's solution.
Instead of the nightly builds of torch, they used pip3 install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1
I deleted my previous conda environments, and used the above code to install the needful in a newly created environment.

This worked for me and now I am running both Schnell and Dev without any hiccups on my M1 Max with 64GB of RAM.

Schnell takes about 110-120 seconds for a 4 iteration render, while Dev takes much longer for a 20 iteration render.

FiditeNemini

Aug 5

@karanf , thanks very much for the tip, that worked much better than the --cpu, more elegant and it works a lot faster!

bharatcoder

Aug 7

I have 24GB unified memory. Is there any way I can run it locally? When trying to load quantized model, I am getting this error.
!!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

Vargol

Aug 7

•

edited Aug 7

I have 24GB unified memory. Is there any way I can run it locally? When trying to load quantized model, I am getting this error.
!!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

MacOS can't run FLUX quantised at the moment. Mac's do not support fp8. There's also a bug triggered by FLUX in PyTorch that causes the noisy image at the top of the issue with MacOS 14 and PyTorch 2.4.
So to run it you need to run with fp16 or bf16 and using pyTorch 2.3.1.

You'll also need whatever setting to need to get ComfyUI to get PyTorch to ignore the MPS suggested allocation limit (export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0) and It'll run but it'll swap.
I don't use ComfyUI I use Diffusers and I get 70 - 90 seconds per iter on my 24Gb 10 GPU core iMac.

bharatcoder

Aug 7

I have 24GB unified memory. Is there any way I can run it locally? When trying to load quantized model, I am getting this error.
!!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

MacOS can't run FLUX quantised at the moment. Mac's do not support fp8. There's also a bug triggered by FLUX in PyTorch that causes the noisy image at the top of the issue with MacOS 14 and PyTorch 2.4.
So to run it you need to run with fp16 or bf16 and using pyTorch 2.3.1.

You'll also need whatever setting to need to get ComfyUI to get PyTorch to ignore the MPS suggested allocation limit (export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0) and It'll run but it'll swap.
I don't use ComfyUI I use Diffusers and I get 70 - 90 seconds per iter on my 24Gb 10 GPU core iMac.

Thanks for the information and tip. I loaded the model in default mode in ComfyUI, at one point it triggered 30GB of Swap, took 829 seconds to generate this!!! I think it is a futile exercise.

imagineaiuser

Aug 12

•

edited Aug 12

Working on trying to get this running on Mps pro

46494c5553

Aug 13

Great! thanx

Akash05

Aug 22

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
how to use this error in Mac

Vargol

Aug 22

execute
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
before you run comfyui (I don't use comfyui I'm assuming you you if from a terminal command )

Akash05

Aug 22

noob here
can you explain little bit more
thank you

Vargol

Aug 22

M3iMac:~ $ cd ComfyUI
M3iMac:ComfyUI $ export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
M3iMac:ComfyUI $ python main.py

Vargol

Aug 22

I should point out doing this will let ComfyUI have free reign with the memory on your Mac including swap space and may make it grind to a halt.

Akash05

Aug 22

hey
I did that and that error disappeared but now it shows different error

Vargol

Aug 22

You're trying to use a 8bit quantised model that isn't supported by MPS , I'm not sure the undying tech (torch) supports any of the quantised formats people are using with Flux with on Macs.

Akash05

Aug 22

okay
can you suggest me what can I use in my MacBook m1
is there any way I can use flux in my MacBook or else alt of flux I'm totally new in this
thank you for everything

Vargol

Aug 22

How much memory ?
You're probably going to be limited to DiffusionKit or maybe DrawThings, if you've got less than 32GB, DiffusionKit is quite limited at the moment but it can generate using 4bit FLUX.1-schell.
I've not uses DrawThings as its closed source, and I don't think Flow is in the publicly available version,
If you've got 32Gb or more just run the normal fp16 version of Flux in ComfyUI, or use Diffusers if you know python (read everything above for known issues)

Akash05

Aug 22

If your talking about RAM
Then i have only 8 gb

Vargol

Aug 22

They you're pretty much stuffed with regards to Flow, none of the tricks used to reduce usage to that kind of level work on Macs.
You have enough memory for SD1.5 models and you can run SDXL at 768x768, you'll use swap to run higher than that Res wise but it will work just slowly.

Best to ask oon whatever ComfyUI support forums about what you can do to get ComfyUI running on that configuration of Mac.

Akash05

Aug 22

Thank you
I want to learn about model and everything
Can tell me from where i can start
Can you suggest me

v3rlon

Aug 23

Got this to work yesterday when I came across someone else's solution.
Instead of the nightly builds of torch, they used pip3 install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1
I deleted my previous conda environments, and used the above code to install the needful in a newly created environment.

This worked for me and now I am running both Schnell and Dev without any hiccups on my M1 Max with 64GB of RAM.

Schnell takes about 110-120 seconds for a 4 iteration render, while Dev takes much longer for a 20 iteration render.

It also worked for me, but does seem a bit slow. I am on an M1 Ultra / 64GB ram and it is running maybe 5 minutes for Dev on a 1024*1024 using FP16 from a cold start (first render on starting). Subsequent render in the same workflow was about 4 minutes.
Also, I am BRAND NEW to ComfyUI and not much better on AI image generation (was using Fooocus a little before, but wanted Flux.1 for the hands).
I notice a couple of soft errors in terminal when using the workflow from this site (Anime girl holding a cake). These happened on the first render before the progress bar, but did not show again on changing the prompt.
clip missing: ['text_projection.weight']
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

bharatcoder

Aug 26

•

edited Aug 26

How much memory ?
You're probably going to be limited to DiffusionKit or maybe DrawThings, if you've got less than 32GB, DiffusionKit is quite limited at the moment but it can generate using 4bit FLUX.1-schell.
I've not uses DrawThings as its closed source, and I don't think Flow is in the publicly available version,
If you've got 32Gb or more just run the normal fp16 version of Flux in ComfyUI, or use Diffusers if you know python (read everything above for known issues)

I have got it working with DiffusionKit, quantised on 24GB. It takes around 200 seconds for 1024x1024. For Comfyui, I tried the 2 bit gguf, and it works without triggering swap, but I need to offload to CPU the unsupported operations, and thus it is painfully slow.

Vargol

Aug 27

•

edited Aug 27

200 seconds sounds about right for a 10GPU core M3, on 24Gb if you enable low memory mode DiffusionKit will run a non quantised version without swapping too, assuming other stuff is not using
a big chunk of memory as well, in roughly the same time.

I've finally given in and tried DrawThings. It has a q5 (labeled as 8bit) and q8p models which work just as fast and according to "Activity Monitor" never gets above 6.5Gb unless you batch stuff.

FiditeNemini changed discussion status to closed Aug 27

cfobedo4all

28 days ago

Does it work on Mac mini m1 8GB RAM

mawicks0930

6 days ago

The quantized versions of Flux are working fine for me in ComfyUI on an M1 with 16G of memory. I was also getting the error, "Error occurred when executing SamplerCustomAdvanced: BFloat16 is not supported on MPS" when I tried to run the FP8 version.

I followed the instructions here: https://github.com/city96/ComfyUI-GGUF and I'm running flux1-schnell-Q5_0.gguf with no issues. I'm using the current nightly builds of pytorch (2.6.0.dev20241115).