Not running on MacOS ComfyUI

#13
by FiditeNemini - opened

Hi folks,

Hoping this is a simple fix. Currently getting this error when running the schnell or dev model on a Mac with working Comfy installation, using the default workflow.

"Error occurred when executing SamplerCustomAdvanced: BFloat16 is not supported on MPS"

Any tips on how to resolve this?

Thank you,
Will.

same issue here. I have no idea how to fix it

I did try --force-fp16 --cpu as startup parameters when launching comfy and it seems to work then, just really slowly, especially the -dev version where you need more steps. This worked for my setup, but ymmv. No idea if this is supposed to even work, tbh.

yeah it works with --cpu but its very very slow. Maybe you could run it with cuda.

Got this to work yesterday when I came across someone else's solution.
Instead of the nightly builds of torch, they used pip3 install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1
I deleted my previous conda environments, and used the above code to install the needful in a newly created environment.

This worked for me and now I am running both Schnell and Dev without any hiccups on my M1 Max with 64GB of RAM.

Schnell takes about 110-120 seconds for a 4 iteration render, while Dev takes much longer for a 20 iteration render.

@karanf , thanks very much for the tip, that worked much better than the --cpu, more elegant and it works a lot faster!

I have 24GB unified memory. Is there any way I can run it locally? When trying to load quantized model, I am getting this error.
!!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

I have 24GB unified memory. Is there any way I can run it locally? When trying to load quantized model, I am getting this error.
!!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

MacOS can't run FLUX quantised at the moment. Mac's do not support fp8. There's also a bug triggered by FLUX in PyTorch that causes the noisy image at the top of the issue with MacOS 14 and PyTorch 2.4.
So to run it you need to run with fp16 or bf16 and using pyTorch 2.3.1.

You'll also need whatever setting to need to get ComfyUI to get PyTorch to ignore the MPS suggested allocation limit (export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0) and It'll run but it'll swap.
I don't use ComfyUI I use Diffusers and I get 70 - 90 seconds per iter on my 24Gb 10 GPU core iMac.

I have 24GB unified memory. Is there any way I can run it locally? When trying to load quantized model, I am getting this error.
!!! Exception during processing!!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

MacOS can't run FLUX quantised at the moment. Mac's do not support fp8. There's also a bug triggered by FLUX in PyTorch that causes the noisy image at the top of the issue with MacOS 14 and PyTorch 2.4.
So to run it you need to run with fp16 or bf16 and using pyTorch 2.3.1.

You'll also need whatever setting to need to get ComfyUI to get PyTorch to ignore the MPS suggested allocation limit (export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0) and It'll run but it'll swap.
I don't use ComfyUI I use Diffusers and I get 70 - 90 seconds per iter on my 24Gb 10 GPU core iMac.

Thanks for the information and tip. I loaded the model in default mode in ComfyUI, at one point it triggered 30GB of Swap, took 829 seconds to generate this!!! I think it is a futile exercise.
ComfyUI_00149_.png

Screenshot 2024-08-07 at 6.49.46 PM.png

Working on trying to get this running on Mps pro

Great! thanx

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
how to use this error in Mac
Screenshot 2024-08-23 at 12.14.54 AM.png

execute
export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
before you run comfyui (I don't use comfyui I'm assuming you you if from a terminal command )

noob here
can you explain little bit more
thank you

M3iMac:~ $ cd ComfyUI
M3iMac:ComfyUI $ export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
M3iMac:ComfyUI $ python main.py

I should point out doing this will let ComfyUI have free reign with the memory on your Mac including swap space and may make it grind to a halt.

hey
I did that and that error disappeared but now it shows different error
Screenshot 2024-08-23 at 1.46.22 AM.png

You're trying to use a 8bit quantised model that isn't supported by MPS , I'm not sure the undying tech (torch) supports any of the quantised formats people are using with Flux with on Macs.

okay
can you suggest me what can I use in my MacBook m1
is there any way I can use flux in my MacBook or else alt of flux I'm totally new in this
thank you for everything

How much memory ?
You're probably going to be limited to DiffusionKit or maybe DrawThings, if you've got less than 32GB, DiffusionKit is quite limited at the moment but it can generate using 4bit FLUX.1-schell.
I've not uses DrawThings as its closed source, and I don't think Flow is in the publicly available version,
If you've got 32Gb or more just run the normal fp16 version of Flux in ComfyUI, or use Diffusers if you know python (read everything above for known issues)

If your talking about RAM
Then i have only 8 gb

They you're pretty much stuffed with regards to Flow, none of the tricks used to reduce usage to that kind of level work on Macs.
You have enough memory for SD1.5 models and you can run SDXL at 768x768, you'll use swap to run higher than that Res wise but it will work just slowly.

Best to ask oon whatever ComfyUI support forums about what you can do to get ComfyUI running on that configuration of Mac.

Thank you
I want to learn about model and everything
Can tell me from where i can start
Can you suggest me

Got this to work yesterday when I came across someone else's solution.
Instead of the nightly builds of torch, they used pip3 install torch==2.3.1 torchaudio==2.3.1 torchvision==0.18.1
I deleted my previous conda environments, and used the above code to install the needful in a newly created environment.

This worked for me and now I am running both Schnell and Dev without any hiccups on my M1 Max with 64GB of RAM.

Schnell takes about 110-120 seconds for a 4 iteration render, while Dev takes much longer for a 20 iteration render.

It also worked for me, but does seem a bit slow. I am on an M1 Ultra / 64GB ram and it is running maybe 5 minutes for Dev on a 1024*1024 using FP16 from a cold start (first render on starting). Subsequent render in the same workflow was about 4 minutes.
Also, I am BRAND NEW to ComfyUI and not much better on AI image generation (was using Fooocus a little before, but wanted Flux.1 for the hands).
I notice a couple of soft errors in terminal when using the workflow from this site (Anime girl holding a cake). These happened on the first render before the progress bar, but did not show again on changing the prompt.
clip missing: ['text_projection.weight']
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

How much memory ?
You're probably going to be limited to DiffusionKit or maybe DrawThings, if you've got less than 32GB, DiffusionKit is quite limited at the moment but it can generate using 4bit FLUX.1-schell.
I've not uses DrawThings as its closed source, and I don't think Flow is in the publicly available version,
If you've got 32Gb or more just run the normal fp16 version of Flux in ComfyUI, or use Diffusers if you know python (read everything above for known issues)

I have got it working with DiffusionKit, quantised on 24GB. It takes around 200 seconds for 1024x1024. For Comfyui, I tried the 2 bit gguf, and it works without triggering swap, but I need to offload to CPU the unsupported operations, and thus it is painfully slow.

200 seconds sounds about right for a 10GPU core M3, on 24Gb if you enable low memory mode DiffusionKit will run a non quantised version without swapping too, assuming other stuff is not using
a big chunk of memory as well, in roughly the same time.

I've finally given in and tried DrawThings. It has a q5 (labeled as 8bit) and q8p models which work just as fast and according to "Activity Monitor" never gets above 6.5Gb unless you batch stuff.

FiditeNemini changed discussion status to closed

Sign up or log in to comment