inferring device map for model
#4
by
mahdi-b
- opened
I am trying to load the pre-trained model using device_map="auto",
but I get an error saying that:
...
File ~/anaconda3/envs/esm_2/lib/python3.9/site-packages/transformers/modeling_utils.py:2406, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2404 # Dispatch model with hooks on all devices if necessary
2405 if device_map is not None:
-> 2406 dispatch_model(model, device_map=device_map, offload_dir=offload_folder, offload_index=offload_index)
2408 if output_loading_info:
2409 if loading_info is None:
TypeError: dispatch_model() got an unexpected keyword argument 'offload_index'
Looks likes indeed the function dispatch_model
does not have an 'offload_index'. I tried removing offload_index
from the function call, it crashed my server π.
Also, It seems that I cannot even generate a device map for the Facebook ESM model. Trying the following:
from accelerate import init_empty_weights
from transformers import AutoConfig, AutoModel, AutoTokenizer
config = AutoConfig.from_pretrained("facebook/esm2_t6_8M_UR50D")
with init_empty_weights():
model = AutoModel.from_config(config)
device_map = infer_auto_device_map(model)
divice_map
returns
{'': 0}
While running a small model like facebook/esm2_t6_8M_UR50D
is not an issue, I am afraid that the larger model (3B or 15B) will not be useable unless one can split the weights across GPUs. Any thoughts about the issue above would be greatly appreciated.
Thank you!