Apple CoreML conversion tool failing on 3.5-medium
#9
by
matthewmihok
- opened
Currently attempting to convert 3.5-medium using https://github.com/apple/ml-stable-diffusion
Running the following command
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --chunk-unet --attention-implementation SPLIT_EINSUM_V2 --convert-text-encoder --convert-vae-encoder --convert-vae-decoder --convert-safety-checker --model-version stabilityai/stable-diffusion-3.5-medium -o models/
Outputs the following:
Torch version 2.5.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.4.0 is the most recent version that has been tested.
Fail to import BlobReader from libmilstoragepython. No module named 'coremltools.libmilstoragepython'
Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'
Fail to import BlobWriter from libmilstoragepython. No module named 'coremltools.libmilstoragepython'
INFO:__main__:Initializing DiffusionPipeline with stabilityai/stable-diffusion-3.5-medium..
model_index.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 706/706 [00:00<00:00, 972kB/s]
A mixture of fp16 and non-fp16 filenames will be loaded.
Loaded fp16 filenames:
[text_encoder_3/model.fp16-00002-of-00002.safetensors, text_encoder_3/model.fp16-00001-of-00002.safetensors, text_encoder_2/model.fp16.safetensors, text_encoder/model.fp16.safetensors]
Loaded non-fp16 filenames:
[vae/diffusion_pytorch_model.safetensors, vae copy/diffusion_pytorch_model.safetensors, transformer/diffusion_pytorch_model.safetensors
If this behavior is not expected, please check your folder structure.
scheduler/scheduler_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββ| 141/141 [00:00<00:00, 1.04MB/s]
text_encoder/config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 574/574 [00:00<00:00, 6.17MB/s]
text_encoder_3/config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ| 740/740 [00:00<00:00, 8.23MB/s]
text_encoder_2/config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ| 570/570 [00:00<00:00, 6.70MB/s]
tokenizer/special_tokens_map.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββ| 588/588 [00:00<00:00, 3.87MB/s]
tokenizer/tokenizer_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββ| 705/705 [00:00<00:00, 7.84MB/s]
(β¦)oder_3/model.safetensors.index.fp16.json: 100%|βββββββββββββββββββββββββββββββββ| 21.0k/21.0k [00:00<00:00, 573kB/s]
tokenizer_2/special_tokens_map.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββ| 576/576 [00:00<00:00, 3.33MB/s]
tokenizer_2/tokenizer_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββ| 856/856 [00:00<00:00, 5.10MB/s]
tokenizer/merges.txt: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 525k/525k [00:00<00:00, 806kB/s]
tokenizer_3/special_tokens_map.json: 100%|ββββββββββββββββββββββββββββββββββββββββ| 2.54k/2.54k [00:00<00:00, 12.9MB/s]
tokenizer/vocab.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.06M/1.06M [00:01<00:00, 891kB/s]
spiece.model: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 792k/792k [00:00<00:00, 1.13MB/s]
tokenizer_3/tokenizer_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββ| 20.6k/20.6k [00:00<00:00, 1.34MB/s]
transformer/config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 524/524 [00:00<00:00, 1.51MB/s]
vae/config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 809/809 [00:00<00:00, 3.59MB/s]
tokenizer_3/tokenizer.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.42M/2.42M [00:03<00:00, 717kB/s]
diffusion_pytorch_model.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββ| 168M/168M [02:17<00:00, 1.22MB/s]
model.fp16.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 247M/247M [03:58<00:00, 1.04MB/s]
model.fp16.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.39G/1.39G [16:37<00:00, 1.39MB/s]
model.fp16-00002-of-00002.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββ| 4.53G/4.53G [44:01<00:00, 1.71MB/s]
model.fp16-00001-of-00002.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββ| 4.99G/4.99G [45:59<00:00, 1.81MB/s]
diffusion_pytorch_model.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββ| 4.94G/4.94G [46:20<00:00, 1.78MB/s]
Fetching 27 files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 27/27 [46:22<00:00, 103.05s/it]
Keyword arguments {'use_auth_token': True} are not expected by StableDiffusion3Pipeline and will be ignored., 1.71MB/s]
Loading pipeline components...: 11%|ββββββ | 1/9 [00:00<00:02, 3.02it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizersG/4.94G [45:56<00:57, 2.39MB/s]
Loading pipeline components...: 33%|ββββββββββββββββββ | 3/9 [00:00<00:00, 6.01it/s]The config attributes {'dual_attention_layers': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'qk_norm': 'rms_norm'} were passed to SD3Transformer2DModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Loading pipeline components...: 33%|ββββββββββββββββββ | 3/9 [00:07<00:14, 2.46s/it]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/mihok/Development/src/github.com/apple/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1738, in <module>
main(args)
File "/Users/mihok/Development/src/github.com/apple/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1485, in main
pipe = get_pipeline(args)
^^^^^^^^^^^^^^^^^^
File "/Users/mihok/Development/src/github.com/apple/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1470, in get_pipeline
pipe = DiffusionPipeline.from_pretrained(model_version,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/lib/python3.12/site-packages/diffusers/pipelines/pipeline_utils.py", line 876, in from_pretrained
loaded_sub_model = load_sub_model(
^^^^^^^^^^^^^^^
File "/opt/miniconda3/lib/python3.12/site-packages/diffusers/pipelines/pipeline_loading_utils.py", line 700, in load_sub_model
loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/lib/python3.12/site-packages/diffusers/models/modeling_utils.py", line 747, in from_pretrained
unexpected_keys = load_model_dict_into_meta(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/lib/python3.12/site-packages/diffusers/models/model_loading_utils.py", line 154, in load_model_dict_into_meta
raise ValueError(
ValueError: Cannot load /Users/mihok/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3.5-medium/snapshots/4ab6c3331a7591f128a21e617f0d9d3fc7e06e42/transformer because transformer_blocks.0.norm1.linear.bias expected shape tensor(..., device='meta', size=(9216,)), but got torch.Size([13824]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.