stabilityai/stable-diffusion-3.5-medium · Apple CoreML conversion tool failing on 3.5-medium

Currently attempting to convert 3.5-medium using https://github.com/apple/ml-stable-diffusion

Running the following command

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --chunk-unet --attention-implementation SPLIT_EINSUM_V2 --convert-text-encoder --convert-vae-encoder --convert-vae-decoder --convert-safety-checker --model-version stabilityai/stable-diffusion-3.5-medium -o models/

Outputs the following:

Torch version 2.5.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.4.0 is the most recent version that has been tested.
Fail to import BlobReader from libmilstoragepython. No module named 'coremltools.libmilstoragepython'
Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'
Fail to import BlobWriter from libmilstoragepython. No module named 'coremltools.libmilstoragepython'
INFO:__main__:Initializing DiffusionPipeline with stabilityai/stable-diffusion-3.5-medium..
model_index.json: 100%|████████████████████████████████████████████████████████████████| 706/706 [00:00<00:00, 972kB/s]

A mixture of fp16 and non-fp16 filenames will be loaded.
Loaded fp16 filenames:
[text_encoder_3/model.fp16-00002-of-00002.safetensors, text_encoder_3/model.fp16-00001-of-00002.safetensors, text_encoder_2/model.fp16.safetensors, text_encoder/model.fp16.safetensors]
Loaded non-fp16 filenames:
[vae/diffusion_pytorch_model.safetensors, vae copy/diffusion_pytorch_model.safetensors, transformer/diffusion_pytorch_model.safetensors
If this behavior is not expected, please check your folder structure.
scheduler/scheduler_config.json: 100%|████████████████████████████████████████████████| 141/141 [00:00<00:00, 1.04MB/s]
text_encoder/config.json: 100%|███████████████████████████████████████████████████████| 574/574 [00:00<00:00, 6.17MB/s]
text_encoder_3/config.json: 100%|█████████████████████████████████████████████████████| 740/740 [00:00<00:00, 8.23MB/s]
text_encoder_2/config.json: 100%|█████████████████████████████████████████████████████| 570/570 [00:00<00:00, 6.70MB/s]
tokenizer/special_tokens_map.json: 100%|██████████████████████████████████████████████| 588/588 [00:00<00:00, 3.87MB/s]
tokenizer/tokenizer_config.json: 100%|████████████████████████████████████████████████| 705/705 [00:00<00:00, 7.84MB/s]
(…)oder_3/model.safetensors.index.fp16.json: 100%|█████████████████████████████████| 21.0k/21.0k [00:00<00:00, 573kB/s]
tokenizer_2/special_tokens_map.json: 100%|████████████████████████████████████████████| 576/576 [00:00<00:00, 3.33MB/s]
tokenizer_2/tokenizer_config.json: 100%|██████████████████████████████████████████████| 856/856 [00:00<00:00, 5.10MB/s]
tokenizer/merges.txt: 100%|██████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 806kB/s]
tokenizer_3/special_tokens_map.json: 100%|████████████████████████████████████████| 2.54k/2.54k [00:00<00:00, 12.9MB/s]
tokenizer/vocab.json: 100%|████████████████████████████████████████████████████████| 1.06M/1.06M [00:01<00:00, 891kB/s]
spiece.model: 100%|█████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 1.13MB/s]
tokenizer_3/tokenizer_config.json: 100%|██████████████████████████████████████████| 20.6k/20.6k [00:00<00:00, 1.34MB/s]
transformer/config.json: 100%|████████████████████████████████████████████████████████| 524/524 [00:00<00:00, 1.51MB/s]
vae/config.json: 100%|████████████████████████████████████████████████████████████████| 809/809 [00:00<00:00, 3.59MB/s]
tokenizer_3/tokenizer.json: 100%|██████████████████████████████████████████████████| 2.42M/2.42M [00:03<00:00, 717kB/s]
diffusion_pytorch_model.safetensors: 100%|██████████████████████████████████████████| 168M/168M [02:17<00:00, 1.22MB/s]
model.fp16.safetensors: 100%|███████████████████████████████████████████████████████| 247M/247M [03:58<00:00, 1.04MB/s]
model.fp16.safetensors: 100%|█████████████████████████████████████████████████████| 1.39G/1.39G [16:37<00:00, 1.39MB/s]
model.fp16-00002-of-00002.safetensors: 100%|██████████████████████████████████████| 4.53G/4.53G [44:01<00:00, 1.71MB/s]
model.fp16-00001-of-00002.safetensors: 100%|██████████████████████████████████████| 4.99G/4.99G [45:59<00:00, 1.81MB/s]
diffusion_pytorch_model.safetensors: 100%|████████████████████████████████████████| 4.94G/4.94G [46:20<00:00, 1.78MB/s]
Fetching 27 files: 100%|██████████████████████████████████████████████████████████████| 27/27 [46:22<00:00, 103.05s/it]
Keyword arguments {'use_auth_token': True} are not expected by StableDiffusion3Pipeline and will be ignored., 1.71MB/s]
Loading pipeline components...:  11%|█████▊                                              | 1/9 [00:00<00:02,  3.02it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizersG/4.94G [45:56<00:57, 2.39MB/s]
Loading pipeline components...:  33%|█████████████████▎                                  | 3/9 [00:00<00:00,  6.01it/s]The config attributes {'dual_attention_layers': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'qk_norm': 'rms_norm'} were passed to SD3Transformer2DModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Loading pipeline components...:  33%|█████████████████▎                                  | 3/9 [00:07<00:14,  2.46s/it]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/mihok/Development/src/github.com/apple/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1738, in <module>
    main(args)
  File "/Users/mihok/Development/src/github.com/apple/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1485, in main
    pipe = get_pipeline(args)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/mihok/Development/src/github.com/apple/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1470, in get_pipeline
    pipe = DiffusionPipeline.from_pretrained(model_version,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/diffusers/pipelines/pipeline_utils.py", line 876, in from_pretrained
    loaded_sub_model = load_sub_model(
                       ^^^^^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/diffusers/pipelines/pipeline_loading_utils.py", line 700, in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/diffusers/models/modeling_utils.py", line 747, in from_pretrained
    unexpected_keys = load_model_dict_into_meta(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/diffusers/models/model_loading_utils.py", line 154, in load_model_dict_into_meta
    raise ValueError(
ValueError: Cannot load /Users/mihok/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/4ab6c3331a7591f128a21e617f0d9d3fc7e06e42/transformer because transformer_blocks.0.norm1.linear.bias expected shape tensor(..., device='meta', size=(9216,)), but got torch.Size([13824]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.