Compatibility with mps/Mac M1?

#6
by ymgenesis - opened

torch_dtype=auto doesn't seem to take mps.
I get AttributeError: 'GPTNeoXForCausalLM' object has no attribute 'mps' when trying to troubleshoot.
I installed the nightly torch with:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

Here's the changes I made:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")
model = AutoModelForCausalLM.from_pretrained(
  "stabilityai/stablecode-instruct-alpha-3b",
  trust_remote_code=True,
  torch_dtype=torch.bfloat16,
)
model=model.to("mps")
model.mps()
inputs = tokenizer("###Instruction\nGenerate a python function to find number of CPU cores###Response\n", return_tensors="pt").to("mps")
tokens = model.generate(
  **inputs,
  max_new_tokens=48,
  temperature=0.2,
  do_sample=True,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

and I get:

TypeError: BFloat16 is not supported on MPS

I thought float16 was made compatible recently?

This is not a solution, but you can run it using the CPU.

model.to("cpu")    <---- Add this
inputs = tokenizer(
    "###Instruction\nGenerate a java function to find number of CPU cores###Response\n", 
    return_tensors="pt",
    return_token_type_ids=False,    <---- Add this
).to("cpu")    <---- Add this

tokens = model.generate(
  **inputs,
  max_new_tokens=48,
  temperature=0.2,
  do_sample=True,
  pad_token_id=50256    <---- Add this
)

Thanks that seemed to work, though using CPU it takes about 10 minutes to generate an answer (expectedly).

Looking forward to mps compatibility with PyTorch (https://pytorch.org/docs/stable/notes/mps.html).

Taking guidance from the link from pytorch above I seem to have got it working with mps on my M1 Macbook Pro.

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

else:
    print("Attempting to use MPS...")
    mps_device = torch.device("mps")

    tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")
    streamer = TextStreamer(tokenizer)
    model = AutoModelForCausalLM.from_pretrained(
      "stabilityai/stablecode-instruct-alpha-3b",
      trust_remote_code=True,
    )
    model.to(mps_device)
    
    inputs = tokenizer(
      "\n###Instruction\n\nGenerate a python function to find number of CPU cores\n\n###Response\n",
      return_tensors="pt",
      return_token_type_ids=False,
    ).to(mps_device)
    tokens = model.generate(
      **inputs,
      max_new_tokens=48,
      temperature=0.2,
      do_sample=True,
      pad_token_id=50256,
      streamer=streamer
    )

    print(tokenizer.decode(tokens[0], skip_special_tokens=True))

Changing max_new_tokens to something larger to get anything of length. Added TextStreamer to visualize the output as generation is still slow with mps, but it's definitely using it.

@ymgenesis thanks so much for this!
After using your solution I ran into another issue,

RuntimeError: MPS does not support cumsum op with int64 input

Got it working on my M1 Macbook Pro by following this solution:
pip3 install --upgrade --no-deps --force-reinstall --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
Ref: https://github.com/pytorch/pytorch/issues/96610#issuecomment-1593230620

Taking guidance from the link from pytorch above I seem to have got it working with mps on my M1 Macbook Pro.

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

else:
    print("Attempting to use MPS...")
    mps_device = torch.device("mps")

    tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")
    streamer = TextStreamer(tokenizer)
    model = AutoModelForCausalLM.from_pretrained(
      "stabilityai/stablecode-instruct-alpha-3b",
      trust_remote_code=True,
    )
    model.to(mps_device)
    
    inputs = tokenizer(
      "\n###Instruction\n\nGenerate a python function to find number of CPU cores\n\n###Response\n",
      return_tensors="pt",
      return_token_type_ids=False,
    ).to(mps_device)
    tokens = model.generate(
      **inputs,
      max_new_tokens=48,
      temperature=0.2,
      do_sample=True,
      pad_token_id=50256,
      streamer=streamer
    )

    print(tokenizer.decode(tokens[0], skip_special_tokens=True))

Changing max_new_tokens to something larger to get anything of length. Added TextStreamer to visualize the output as generation is still slow with mps, but it's definitely using it.

This comment has been hidden

you could use torch.float16 replace torch. bfloat16,bfloat16 run with cpu and cuda only currently。

@LinyiZheng how can you change from bfloat16 to float16 in the terminal?

Sign up or log in to comment