ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on.
ValueError Traceback (most recent call last)
Cell In[17], line 25
22 print_trainable_parameters(model)
24 # Apply the accelerator. You can comment this out to remove the accelerator.
---> 25 model = accelerator.prepare_model(model)
File /vc_data/shankum/miniconda3/envs/llm2/lib/python3.11/site-packages/accelerate/accelerator.py:1392, in Accelerator.prepare_model(self, model, device_placement, evaluation_mode)
1389 if torch.device(current_device_index) != self.device:
1390 # if on the first device (GPU 0) we don't care
1391 if (self.device.index is not None) or (current_device_index != 0):
-> 1392 raise ValueError(
1393 "You can't train a model that has been loaded in 8-bit precision on a different device than the one "
1394 "you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}"
1395 )
1397 if "cpu" in model_devices or "disk" in model_devices:
1398 raise ValueError(
1399 "You can't train a model that has been loaded in 8-bit precision with CPU or disk offload."
1400 )
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}
Are your tokens/ouptut from tokenizer, on the same device on which your model is loaded on?
If your model is on GPU, then make sure you update the token tensors , to GPU as well.
This is an accelerate issue where I am using a multi-gpu setup. I have used the same setup with other SLMs like Zephyr, Llama2 and they seem to work
Playing around with the accelerate settings fixed it for me.
@madhurjindal what kind of hardware are you using? and how many gpus ?
@saireddy I am using 8xV100 32GB
@madhurjindal can you try using 4 of those than 8. its funny but this has fixed for me
same issue
@saireddy didn't fix it for me
Playing around with the accelerate settings fixed it for me.
could you elaborate more, please ? thanks
Hi everyone!
In order to fix this issue, you need to make sure to force-load the model into a single GPU and replicate that across all GPUs, to achieve this, please follow the solution proposed here: https://github.com/huggingface/accelerate/issues/1840#issuecomment-1683105994