Using Stable Diffusion with Core ML on Apple Silicon

Published December 1, 2022

Thanks to Apple engineers, you can now run Stable Diffusion on Apple Silicon using Core ML!

This Apple repo provides conversion scripts and inference code based on 🧨 Diffusers, and we love it! To make it as easy as possible for you, we converted the weights ourselves and put the Core ML versions of the models in the Hugging Face Hub.

Update: some weeks after this post was written we created a native Swift app that you can use to run Stable Diffusion effortlessly on your own hardware. We released an app in the Mac App Store as well as the source code to allow other projects to use it.

The rest of this post guides you on how to use the converted weights in your own code or convert additional weights yourself.

Available Checkpoints

The official Stable Diffusion checkpoints are already converted and ready for use:

Stable Diffusion v1.4: converted original
Stable Diffusion v1.5: converted original
Stable Diffusion v2 base: converted original
Stable Diffusion v2.1 base: converted original

Core ML supports all the compute units available in your device: CPU, GPU and Apple's Neural Engine (NE). It's also possible for Core ML to run different portions of the model in different devices to maximize performance.

There are several variants of each model that may yield different performance depending on the hardware you use. We recommend you try them out and stick with the one that works best in your system. Read on for details.

Notes on Performance

There are several variants per model:

"Original" attention vs "split_einsum". These are two alternative implementations of the critical attention blocks. split_einsum was previously introduced by Apple, and is compatible with all the compute units (CPU, GPU and Apple's Neural Engine). original, on the other hand, is only compatible with CPU and GPU. Nevertheless, original can be faster than split_einsum on some devices, so do check it out!
"ML Packages" vs "Compiled" models. The former is suitable for Python inference, while the compiled version is required for Swift code. The compiled models in the Hub split the large UNet model weights in several files for compatibility with iOS and iPadOS devices. This corresponds to the --chunk-unet conversion option.

At the time of this writing, we got best results on my MacBook Pro (M1 Max, 32 GPU cores, 64 GB) using the following combination:

original attention.
all compute units (see next section for details).
macOS Ventura 13.1 Beta 4 (22C5059b).

With these, it took 18s to generate one image with the Core ML version of Stable Diffusion v1.4 🤯.

⚠️ Note

Several improvements to Core ML were introduced in macOS Ventura 13.1, and they are required by Apple's implementation. You may get black images –and much slower times– if you use previous versions of macOS.

Each model repo is organized in a tree structure that provides these different variants:

coreml-stable-diffusion-v1-4
├── README.md
├── original
│   ├── compiled
│   └── packages
└── split_einsum
    ├── compiled
    └── packages

You can download and use the variant you need as shown below.

Core ML Inference in Python

Prerequisites

pip install huggingface_hub
pip install git+https://github.com/apple/ml-stable-diffusion

Download the Model Checkpoints

To run inference in Python, you have to use one of the versions stored in the packages folders, because the compiled ones are only compatible with Swift. You may choose whether you want to use the original or split_einsum attention styles.

This is how you'd download the original attention variant from the Hub:

from huggingface_hub import snapshot_download
from pathlib import Path

repo_id = "apple/coreml-stable-diffusion-v1-4"
variant = "original/packages"

model_path = Path("./models") / (repo_id.split("/")[-1] + "_" + variant.replace("/", "_"))
snapshot_download(repo_id, allow_patterns=f"{variant}/*", local_dir=model_path, local_dir_use_symlinks=False)
print(f"Model downloaded at {model_path}")

The code above will place the downloaded model snapshot inside a directory called models.

Inference

Once you have downloaded a snapshot of the model, the easiest way to run inference would be to use Apple's Python script.

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i models/coreml-stable-diffusion-v1-4_original_packages -o </path/to/output/image> --compute-unit ALL --seed 93

<output-mlpackages-directory> should point to the checkpoint you downloaded in the step above, and --compute-unit indicates the hardware you want to allow for inference. It must be one of the following options: ALL, CPU_AND_GPU, CPU_ONLY, CPU_AND_NE. You may also provide an optional output path, and a seed for reproducibility.

The inference script assumes the original version of the Stable Diffusion model, stored in the Hub as CompVis/stable-diffusion-v1-4. If you use another model, you have to specify its Hub id in the inference command-line, using the --model-version option. This works both for models already supported, and for custom models you trained or fine-tuned yourself.

For Stable Diffusion 1.5 (Hub id: runwayml/stable-diffusion-v1-5):

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-v1-5_original_packages --model-version runwayml/stable-diffusion-v1-5

For Stable Diffusion 2 base (Hub id: stabilityai/stable-diffusion-2-base):

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base

Core ML inference in Swift

Running inference in Swift is slightly faster than in Python, because the models are already compiled in the mlmodelc format. This will be noticeable on app startup when the model is loaded, but shouldn’t be noticeable if you run several generations afterwards.

Download

To run inference in Swift on your Mac, you need one of the compiled checkpoint versions. We recommend you download them locally using Python code similar to the one we showed above, but using one of the compiled variants:

from huggingface_hub import snapshot_download
from pathlib import Path

repo_id = "apple/coreml-stable-diffusion-v1-4"
variant = "original/compiled"

model_path = Path("./models") / (repo_id.split("/")[-1] + "_" + variant.replace("/", "_"))
snapshot_download(repo_id, allow_patterns=f"{variant}/*", local_dir=model_path, local_dir_use_symlinks=False)
print(f"Model downloaded at {model_path}")

Inference

To run inference, please clone Apple's repo:

git clone https://github.com/apple/ml-stable-diffusion
cd ml-stable-diffusion

And then use Apple's command-line tool using Swift Package Manager's facilities:

swift run StableDiffusionSample --resource-path models/coreml-stable-diffusion-v1-4_original_compiled --compute-units all "a photo of an astronaut riding a horse on mars"

You have to specify in --resource-path one of the checkpoints downloaded in the previous step, so please make sure it contains compiled Core ML bundles with the extension .mlmodelc. The --compute-units has to be one of these values: all, cpuOnly, cpuAndGPU, cpuAndNeuralEngine.

For more details, please refer to the instructions in Apple's repo.

Bring Your own Model

If you have created your own models compatible with Stable Diffusion (for example, if you used Dreambooth, Textual Inversion or fine-tuning), then you have to convert the models yourself. Fortunately, Apple provides a conversion script that allows you to do so.

For this task, we recommend you follow these instructions.

Next Steps

We are really excited about the opportunities this brings and can't wait to see what the community can create from here. Some potential ideas are:

Native, high-quality apps for Mac, iPhone and iPad.
Bring additional schedulers to Swift, for even faster inference.
Additional pipelines and tasks.
Explore quantization techniques and further optimizations.

Looking forward to seeing what you create!

WWDC 24: Running Mistral 7B with Core ML

By July 22, 2024 • 55

Releasing Swift Transformers: Run On-Device LLMs in Apple Devices

By August 8, 2023 • 23

Upvote