Run inference on 2 GPUs
#112
by
bweinstein123
- opened
Hi,
I have 2 RTX600 GPUs but I can't figure out how to run in the following way, on both gpus.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.half().cuda()
inputs = tokenizer(text, return_tensors="pt")
inputs_gpu = {key: value.to("cuda") for key, value in inputs.items()}
outputs = model.generate(**inputs_gpu, max_new_tokens=500)
Hi
@bweinstein123
Please see my comment here: https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/discussions/36#65b8d5cf23d948d884d19645 to understand how to run multi-GPU inference