Compatibility with mps/Mac M1?
torch_dtype=auto
doesn't seem to take mps.
I get AttributeError: 'GPTNeoXForCausalLM' object has no attribute 'mps'
when trying to troubleshoot.
I installed the nightly torch with:pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
Here's the changes I made:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")
model = AutoModelForCausalLM.from_pretrained(
"stabilityai/stablecode-instruct-alpha-3b",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
model=model.to("mps")
model.mps()
inputs = tokenizer("###Instruction\nGenerate a python function to find number of CPU cores###Response\n", return_tensors="pt").to("mps")
tokens = model.generate(
**inputs,
max_new_tokens=48,
temperature=0.2,
do_sample=True,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))
and I get:
TypeError: BFloat16 is not supported on MPS
I thought float16 was made compatible recently?
This is not a solution, but you can run it using the CPU.
model.to("cpu") <---- Add this
inputs = tokenizer(
"###Instruction\nGenerate a java function to find number of CPU cores###Response\n",
return_tensors="pt",
return_token_type_ids=False, <---- Add this
).to("cpu") <---- Add this
tokens = model.generate(
**inputs,
max_new_tokens=48,
temperature=0.2,
do_sample=True,
pad_token_id=50256 <---- Add this
)
Thanks that seemed to work, though using CPU it takes about 10 minutes to generate an answer (expectedly).
Looking forward to mps compatibility with PyTorch (https://pytorch.org/docs/stable/notes/mps.html).
Taking guidance from the link from pytorch above I seem to have got it working with mps on my M1 Macbook Pro.
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch
if not torch.backends.mps.is_available():
if not torch.backends.mps.is_built():
print("MPS not available because the current PyTorch install was not "
"built with MPS enabled.")
else:
print("MPS not available because the current MacOS version is not 12.3+ "
"and/or you do not have an MPS-enabled device on this machine.")
else:
print("Attempting to use MPS...")
mps_device = torch.device("mps")
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(
"stabilityai/stablecode-instruct-alpha-3b",
trust_remote_code=True,
)
model.to(mps_device)
inputs = tokenizer(
"\n###Instruction\n\nGenerate a python function to find number of CPU cores\n\n###Response\n",
return_tensors="pt",
return_token_type_ids=False,
).to(mps_device)
tokens = model.generate(
**inputs,
max_new_tokens=48,
temperature=0.2,
do_sample=True,
pad_token_id=50256,
streamer=streamer
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))
Changing max_new_tokens
to something larger to get anything of length. Added TextStreamer to visualize the output as generation is still slow with mps, but it's definitely using it.
@ymgenesis
thanks so much for this!
After using your solution I ran into another issue,
RuntimeError: MPS does not support cumsum op with int64 input
Got it working on my M1 Macbook Pro by following this solution:pip3 install --upgrade --no-deps --force-reinstall --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
Ref: https://github.com/pytorch/pytorch/issues/96610#issuecomment-1593230620
Taking guidance from the link from pytorch above I seem to have got it working with mps on my M1 Macbook Pro.
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer import torch if not torch.backends.mps.is_available(): if not torch.backends.mps.is_built(): print("MPS not available because the current PyTorch install was not " "built with MPS enabled.") else: print("MPS not available because the current MacOS version is not 12.3+ " "and/or you do not have an MPS-enabled device on this machine.") else: print("Attempting to use MPS...") mps_device = torch.device("mps") tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b") streamer = TextStreamer(tokenizer) model = AutoModelForCausalLM.from_pretrained( "stabilityai/stablecode-instruct-alpha-3b", trust_remote_code=True, ) model.to(mps_device) inputs = tokenizer( "\n###Instruction\n\nGenerate a python function to find number of CPU cores\n\n###Response\n", return_tensors="pt", return_token_type_ids=False, ).to(mps_device) tokens = model.generate( **inputs, max_new_tokens=48, temperature=0.2, do_sample=True, pad_token_id=50256, streamer=streamer ) print(tokenizer.decode(tokens[0], skip_special_tokens=True))
Changing
max_new_tokens
to something larger to get anything of length. Added TextStreamer to visualize the output as generation is still slow with mps, but it's definitely using it.
you could use torch.float16 replace torch. bfloat16,bfloat16 run with cpu and cuda only currently。
@LinyiZheng how can you change from bfloat16 to float16 in the terminal?