Running on Apple's M1 laptop?
Has anyone tried this on M1, well, without CUDA support but I've not reached that far:
python3 generate_openelm.py --model apple/OpenELM-270M-Instruct --hf_access_token hf_XXXXX --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2
...
WARNING:root:No CUDA device detected, using cpu, expect slower speeds.
config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.30k/1.30k [00:00<00:00, 4.05MB/s]
configuration_openelm.py: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 14.3k/14.3k [00:00<00:00, 38.0MB/s]
A new version of the following files was downloaded from https://huggingface.co/apple/OpenELM-270M-Instruct:
- configuration_openelm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_openelm.py: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 39.3k/39.3k [00:00<00:00, 16.3MB/s]
A new version of the following files was downloaded from https://huggingface.co/apple/OpenELM-270M-Instruct: - modeling_openelm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Traceback (most recent call last):
File "/Volumes/codes/OpenELM/generate_openelm.py", line 220, in
output_text, genertaion_time = generate(
File "/Volumes/codes/OpenELM/generate_openelm.py", line 85, in generate
model = AutoModelForCausalLM.from_pretrained(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 475, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 443, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 164, in get_class_in_module
module = importlib.import_module(module_path)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/Users/xxx/.cache/huggingface/modules/transformers_modules/apple/OpenELM-270M-Instruct/1096244b62a03bedc770f8521512fd071f3aa5fd/modeling_openelm.py", line 15, in
from transformers.cache_utils import Cache, DynamicCache, StaticCache
ModuleNotFoundError: No module named 'transformers.cache_utils'
It's here it breaks:
For licensing see accompanying LICENSE file.
Copyright (C) 2024 Apple Inc. All Rights Reserved.
from typing import List, Optional, Tuple, Union
import torch
import torch.utils.checkpoint
from torch import Tensor, nn
from torch.nn import CrossEntropyLoss
from torch.nn import functional as F
from transformers import PreTrainedModel
from transformers.activations import ACT2FN
from transformers.cache_utils import Cache, DynamicCache, StaticCache
from transformers.modeling_outputs import (
BaseModelOutputWithPast,
CausalLMOutputWithPast,
I got past the 'transformers' issue by pulling their github & building, and then added "--device mps" which, after installing ~'torch nightly' appears to get past the 'No Cuda Device' warnings, but installing the 3B parameter model resulted in "RuntimeError: MPS backend out of memory (MPS allocated: 9.05 GB, other allocations: 832.00 KB, max allowed: 9.07 GB). Tried to allocate 36.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)." (Recurring even after setting said flag). Seems to get past that when installing the smaller 1.1B model, but now I am running into permissions issue with tokenizer (see other discussion https://huggingface.co/apple/OpenELM-3B-Instruct/discussions/4).
Thanks for sharing your approach and I will give it a try...
Just an update:
In my case, the transformer problem was fixed with:
pip install transformers --upgrade
The plain update didn't work.
I also have to pass the the meta llama access authorization hurdle too. After that the 270M Instruct works!
A final update: I also got the 3B-Instruct running on the M1 but with 280-290 seconds (!) - the GPU seemed in use to with 40% of load while the CPU occupying ~100% of one core.
Please remove your hugging face token from the comment also.
@shiwanlin how did you get the GPU to work ? It is not using CUDA
@shiwanlin how did you get the GPU to work ? It is not using CUDA
@chrisau168 I've set up the python and torch earlier for the Apple silicon, not for the Intel version. Sorry for the late reply.
Please remove your hugging face token from the comment also.
@hfprashant480 Thanks - it was my negligence :-( . It's been invalidated.