Text Generation
Transformers
PyTorch
English
gpt_neox
text-generation-inference
Inference Endpoints

Running the model on MPS backend (Macbook GPUs) hangs indefinitely

#4
by ggaabe - opened

I'm trying to run the GPU example but with a Macbook M2 Max and it seems trying to use model.to("mps") simply hangs forever without any error message:

tokenizer = AutoTokenizer.from_pretrained(
    path)
model = AutoModelForCausalLM.from_pretrained(
    path, torch_dtype=torch.float16)

print("Device 0:", model.device) # prints: Device 0: cpu
mps_device = torch.device("mps")
print("Device 1:", mps_device)  # prints: Device 1: mps
model = model.to(mps_device)
print("Device 2:", model.device) # ^^ The above line hangs forever, this line is never reached.

Any thoughts on what I could do to get this to run with GPU inference?

PyTorch Version: 2.1.0.dev20230505

What is the size of GPU memory for your M2 Max?

My bad. There was something probably misconfigured with my env though I have no idea what. I had used pip for all the env packages; I retried with conda and it all worked.

Sign up or log in to comment